Incrementally Maintaining Classification using an RDBMS

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

VLDB2011

Scientific paper

The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into relational database management systems (RDBMSs). We study algorithms to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates to the training examples. We make three technical contributions: (1) An algorithm that incrementally maintains classification inside an RDBMS. (2) An analysis of the above algorithm that shows that our algorithm is optimal among all deterministic algorithms (and asymptotically within a factor of 2 of a nondeterministic optimal). (3) An index structure based on the technical ideas that underlie the above algorithm which allows us to store only a fraction of the entities in memory. We apply our techniques to text processing, and we demonstrate that our algorithms provide several orders of magnitude improvement over non-incremental approaches to classification on a variety of data sets: such as the Cora, UCI Machine Learning Repository data sets, Citeseer, and DBLife.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Incrementally Maintaining Classification using an RDBMS does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Incrementally Maintaining Classification using an RDBMS, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Incrementally Maintaining Classification using an RDBMS will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-262406

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.