Harnessing the Deep Web: Present and Future

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

CIDR 2009

Scientific paper

Over the past few years, we have built a system that has exposed large volumes of Deep-Web content to Google.com users. The content that our system exposes contributes to more than 1000 search queries per-second and spans over 50 languages and hundreds of domains. The Deep Web has long been acknowledged to be a major source of structured data on the web, and hence accessing Deep-Web content has long been a problem of interest in the data management community. In this paper, we report on where we believe the Deep Web provides value and where it does not. We contrast two very different approaches to exposing Deep-Web content -- the surfacing approach that we used, and the virtual integration approach that has often been pursued in the data management literature. We emphasize where the values of each of the two approaches lie and caution against potential pitfalls. We outline important areas of future research and, in particular, emphasize the value that can be derived from analyzing large collections of potentially disparate structured data on the web.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Harnessing the Deep Web: Present and Future does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Harnessing the Deep Web: Present and Future, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Harnessing the Deep Web: Present and Future will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-387000

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.