Classifying extremely imbalanced data sets

Physics – Data Analysis – Statistics and Probability

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Imbalanced data sets containing much more background than signal instances are very common in particle physics, and will also be characteristic for the upcoming analyses of LHC data. Following up the work presented at ACAT 2008, we use the multivariate technique presented there (a rule growing algorithm with the meta-methods bagging and instance weighting) on much more imbalanced data sets, especially a selection of D0 decays without the use of particle identification. It turns out that the quality of the result strongly depends on the number of background instances used for training. We discuss methods to exploit this in order to improve the results significantly, and how to handle and reduce the size of large training sets without loss of result quality in general. We will also comment on how to take into account statistical fluctuation in receiver operation characteristic curves (ROC) for comparing classifier methods.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Classifying extremely imbalanced data sets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Classifying extremely imbalanced data sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Classifying extremely imbalanced data sets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-220859

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.