Treebank (Mitch Marcus, Ann
Taylor)
The Penn Treebank Project has produced semantic and syntactic annotations
of naturally-occuring text for the Wall Street Journal, Brown, ATIS and Switchboard Corpora. The
annotations produced by the Treebank project were published by [#LDC LDC]. Treebank has two
query languages: tgrep (at LDC-Online) and CorpusSearch.
The principle advantage of tgrep is its speed, and of CorpusSearch is its ability to pipeline
queries together. Chris Brew has recently developed an extensible
visualisation tool to aid treebank exploration, called TreeStyle. See also the NEGRA Corpus. Douglas Rohde has
developed a more powerful version of tgrep called tgrep2. Treebanks for other languages are in
development, including: German, Turkish, Polish, Czech, Portuguese, Bulgarian, Chinese, ...