Data Partitioning for Parallel Entity Matching

Computer Science – Distributed – Parallel – and Cluster Computing

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Data Partitioning for Parallel Entity Matching Data Partitioning for Parallel Entity Matching

: 2010-06-28
: arxiv.org/abs/1006.5309v1
: Computer Science
: Distributed, Parallel, and Cluster Computing

: 11 pages
: Scientific paper
: Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, blocking to reduce the search space for matching and parallel matching to improve efficiency. Special attention is given to the number and size of data partitions as they impact the overall communication overhead and memory requirements of individual match tasks. We have developed a service-based distributed infrastructure for the parallel execution of match workflows. We evaluate our approach in detail for different match strategies for matching real-world product data of different web shops. We also consider caching of in-put entities and affinity-based scheduling of match tasks.

Affiliated with

Groß Anika

Computer Science – Distributed – Parallel – and Cluster Computing

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hartung Michael

Physics – Condensed Matter – Other Condensed Matter

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kirsten Toralf

Biology – Quantitative Biology – Biomolecules

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kolb Lars

Computer Science – Distributed – Parallel – and Cluster Computing

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Köpcke Hanna

Computer Science – Distributed – Parallel – and Cluster Computing

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Data Partitioning for Parallel Entity Matching does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Data Partitioning for Parallel Entity Matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data Partitioning for Parallel Entity Matching will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-616833

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure