The Hunting of the Bump: On Maximizing Statistical Discrepancy

Computer Science – Computational Geometry

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendix

Scientific paper

Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in $O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n})$ that computes the maximum discrepancy rectangle to within additive error $\epsilon$, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time $\smash[t]{O(n^4)}$.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

The Hunting of the Bump: On Maximizing Statistical Discrepancy does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with The Hunting of the Bump: On Maximizing Statistical Discrepancy, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and The Hunting of the Bump: On Maximizing Statistical Discrepancy will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-134048

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.