Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We investigate the performance of two machine learning algorithms in the context of anti-spam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memory-based learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-126494

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.