Improving Entity Resolution with Global Constraints

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages, 13 figures

Scientific paper

Some of the greatest advances in web search have come from leveraging socio-economic properties of online user behavior. Past advances include PageRank, anchor text, hubs-authorities, and TF-IDF. In this paper, we investigate another socio-economic property that, to our knowledge, has not yet been exploited: sites that create lists of entities, such as IMDB and Netflix, have an incentive to avoid gratuitous duplicates. We leverage this property to resolve entities across the different web sites, and find that we can obtain substantial improvements in resolution accuracy. This improvement in accuracy also translates into robustness, which often reduces the amount of training data that must be labeled for comparing entities across many sites. Furthermore, the technique provides robustness when resolving sites that have some duplicates, even without first removing these duplicates. We present algorithms with very strong precision and recall, and show that max weight matching, while appearing to be a natural choice turns out to have poor performance in some situations. The presented techniques are now being used in the back-end entity resolution system at a major Internet search engine.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Improving Entity Resolution with Global Constraints does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Improving Entity Resolution with Global Constraints, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Improving Entity Resolution with Global Constraints will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-729108

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.