LDC (David
Graff, Chris Cieri, Mark Liberman)
The Linguistic Data Consortium has developed a
range of (mainly SGML-based) formats for transcripts and other types of annotation that it has
published (See below for [UTF NIST's UTF format], which provides a combined framework for
several of these existing formats). Some online documentation is available for individual
corpora authored at different times by different groups, e.g. Switchboard
at TI in 1991, Trains at Rochester in
1992-3, etc, as well as a general SGML transcription specification currently used for
(orthographic) transcription of telephone conversations and broadcast news recordings. The LDC
has also implemented a general data model for searching annotated text and speech corpora
online, via LDC-Online.