PageRank optimization applied to spam detection

Mathematics – Optimization and Control

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

8 pages, 6 figures

Scientific paper

We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

PageRank optimization applied to spam detection does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with PageRank optimization applied to spam detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and PageRank optimization applied to spam detection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-393924

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.