Treebank
TreebankIt is a deep processing corpus that can be used to segment sentences, tag parts of speech, and annotate syntactic structure relationships.
Classification of Treebank
Treebanks can be roughly divided into two categories: phrase structure tree banks and dependency structure tree banks.
- Phrase structure tree library: generally describes sentences using their structural components;
- Dependency structure tree library: built according to the dependency structure of sentences.
The role of treebanks
- Provide data and platform for automatic parsers;
- Provide real text annotation materials for syntactic research;
- The basis for labeling semantic items and semantic relations of words within sentences.
References
【1】Wang Yuelong, Ji Donghong. A review of Chinese treebanks[J]. Contemporary Linguistics, 2009(1):47-55.