Computer Science – Information Retrieval
Scientific paper
2011-12-23
Computer Science
Information Retrieval
supplementary material including the commented R source code can be found at http://www.savbb.sk/~grendar/spam/Supplement.html
Scientific paper
Instead of the 'bag-of-words' representation, in the quantitative profile approach to spam filtering and email categorization, an email is represented by an m-dimensional vector of numbers, with m fixed in advance. Inspired by Sroufe et al. [Sroufe, P., Phithakkitnukoon, S., Dantu, R., and Cangussu, J. (2010). Email shape analysis. In \emph{LNCS}, 5935, pp. 18-29] two instances of quantitative profiles are considered: line profile and character profile. Performance of these profiles is studied on the TREC 2007, CEAS 2008 and a private corpuses. At low computational costs, the two quantitative profiles achieve performance that is at least comparable to that of heuristic rules and naive Bayes.
Grendar Marian
Škutová J.
Špitalský Vladimír
No associations
LandOfFree
Spam filtering by quantitative profiles does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Spam filtering by quantitative profiles, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Spam filtering by quantitative profiles will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-193221