Inverse Category Frequency based supervised term weighting scheme for text categorization

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

The paper is withdrawn

Scientific paper

Unsupervised term weighting schemes, borrowed from information retrieval field, have been widely used for text categorization and the most famous one is tf.idf. The intuition behind idf seems less reasonable for TC task than IR task. In this paper, we introduce inverse category frequency into supervised term weighting schemes and propose a novel icf-based method. The method combines icf and relevance frequency (rf) to weight terms in training dataset. Our experiments have shown that icf-based supervised term weighting scheme is superior to tf.rf and prob-based supervised term weighting schemes and tf.idf based on two widely used datasets, i.e., the unbalanced Reuters-21578 corpus and the balanced 20 Newsgroup corpus. We also present the detailed evaluations of each category of the two datasets among the four term weighting schemes on precision, recall and F1 measure.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Inverse Category Frequency based supervised term weighting scheme for text categorization does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Inverse Category Frequency based supervised term weighting scheme for text categorization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Inverse Category Frequency based supervised term weighting scheme for text categorization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-26460

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.