Computer Science – Computation and Language
Scientific paper
1996-07-11
Proceedings WVLC, Copenhagen
Computer Science
Computation and Language
14 pages, 2 Postscript figures
Scientific paper
We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.
Berck Peter
Daelemans Walter
Gillis Steven
Zavrel Jakub
No associations
LandOfFree
MBT: A Memory-Based Part of Speech Tagger-Generator does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with MBT: A Memory-Based Part of Speech Tagger-Generator, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and MBT: A Memory-Based Part of Speech Tagger-Generator will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-65147