An Improved k-Nearest Neighbor Algorithm for Text Categorization

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

7 pages, 2 tables, 2 figures, to appear in the Proceedings of the 20th International Conference on Computer Processing of Orie

Scientific paper

k is the most important parameter in a text categorization system based on k-Nearest Neighbor algorithm (kNN).In the classification process, k nearest documents to the test one in the training set are determined firstly. Then, the predication can be made according to the category distribution among these k nearest neighbors. Generally speaking, the class distribution in the training set is uneven. Some classes may have more samples than others. Therefore, the system performance is very sensitive to the choice of the parameter k. And it is very likely that a fixed k value will result in a bias on large categories. To deal with these problems, we propose an improved kNN algorithm, which uses different numbers of nearest neighbors for different categories, rather than a fixed number across all categories. More samples (nearest neighbors) will be used for deciding whether a test document should be classified to a category, which has more samples in the training set. Preliminary experiments on Chinese text categorization show that our method is less sensitive to the parameter k than the traditional one, and it can properly classify documents belonging to smaller classes with a large k. The method is promising for some cases, where estimating the parameter k via cross-validation is not allowed.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

An Improved k-Nearest Neighbor Algorithm for Text Categorization does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with An Improved k-Nearest Neighbor Algorithm for Text Categorization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Improved k-Nearest Neighbor Algorithm for Text Categorization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-673823

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.