Computer Science – Computation and Language
Scientific paper
2000-07-13
Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 427-433, Paris: European Language
Computer Science
Computation and Language
7 pages, 2 figures
Scientific paper
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out as separate projects that were dispersed both geographically and chronologically. The TDT2 corpus has also received a variety of annotations, but all directly created or managed by a core group. In both cases, issues arise involving the propagation of repairs, consistency of references, and the ability to integrate annotations having different formats and levels of detail. We describe a general framework whereby these issues can be addressed successfully.
Bird Steven
Graff David
No associations
LandOfFree
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-415586