Human-powered Sorts and Joins

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

VLDB2012

Scientific paper

Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Human-powered Sorts and Joins does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Human-powered Sorts and Joins, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Human-powered Sorts and Joins will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-669700

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.