Similarity-Based Approaches to Natural Language Processing

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

71 pages (single-spaced)

Scientific paper

This thesis presents two similarity-based approaches to sparse data problems. The first approach is to build soft, hierarchical clusters: soft, because each event belongs to each cluster with some probability; hierarchical, because cluster centroids are iteratively split to model finer distinctions. Our second approach is a nearest-neighbor approach: instead of calculating a centroid for each class, as in the hierarchical clustering approach, we in essence build a cluster around each word. We compare several such nearest-neighbor approaches on a word sense disambiguation task and find that as a whole, their performance is far superior to that of standard methods. In another set of experiments, we show that using estimation techniques based on the nearest-neighbor model enables us to achieve perplexity reductions of more than 20 percent over standard techniques in the prediction of low-frequency events, and statistically significant speech recognition error-rate reduction.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Similarity-Based Approaches to Natural Language Processing does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Similarity-Based Approaches to Natural Language Processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Similarity-Based Approaches to Natural Language Processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-183218

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.