Machine Learning in Automated Text Categorization

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Accepted for publication on ACM Computing Surveys

Scientific paper

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Machine Learning in Automated Text Categorization does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Machine Learning in Automated Text Categorization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Machine Learning in Automated Text Categorization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-728494

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.