Computer Science – Learning
Scientific paper
2002-12-08
Computer Science
Learning
11 pages, issued 2002
Scientific paper
The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words -- the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives.
Littman Michael L.
Turney Peter D.
No associations
LandOfFree
Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-547994