An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

24 pages, 13 figures, 8 tables, technical report

Scientific paper

For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the "aboutness" of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constructing a lexical signature for a page from its link neighborhood, that is the "backlinks", or pages that link to the missing page. After testing various methods, we show that one can construct a lexical signature for a missing web page using only ten backlink pages. Further, we show that only the first level of backlinks are useful in this effort. The text that the backlinks use to point to the missing page is used as input for the creation of a four-word lexical signature. That lexical signature is shown to successfully find the target URI in over half of the test cases.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-491850

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.