Minimizing Manual Annotation Cost In Supervised Training From Corpora

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

8 pages, uses epsf.sty and aclap.sty, 6 postscript figures

Scientific paper

Corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. This paper investigates methods for reducing annotation cost by {\it sample selection}. In this approach, during training the learning program examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. This avoids redundantly annotating examples that contribute little new information. This paper extends our previous work on {\it committee-based sample selection} for probabilistic classifiers. We describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-of-speech tagging. We find that all variants achieve a significant reduction in annotation cost, though their computational efficiency differs. In particular, the simplest method, which has no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Minimizing Manual Annotation Cost In Supervised Training From Corpora does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Minimizing Manual Annotation Cost In Supervised Training From Corpora, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Minimizing Manual Annotation Cost In Supervised Training From Corpora will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-268314

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.