Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages, 11 figures, 5 tables, 40 references, accepted for publication at JCDL 2010 in Brisbane, Australia

Scientific paper

Missing web pages (pages that return the 404 "Page Not Found" error) are part of the browsing experience. The manual use of search engines to rediscover missing pages can be frustrating and unsuccessful. We compare four automated methods for rediscovering web pages. We extract the page's title, generate the page's lexical signature (LS), obtain the page's tags from the bookmarking website delicious.com and generate a LS from the page's link neighborhood. We use the output of all methods to query Internet search engines and analyze their retrieval performance. Our results show that both LSs and titles perform fairly well with over 60% URIs returned top ranked from Yahoo!. However, the combination of methods improves the retrieval performance. Considering the complexity of the LS generation, querying the title first and in case of insufficient results querying the LSs second is the preferable setup. This combination accounts for more than 75% top ranked URIs.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-331338

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.