Optimal properties of centroid-based classifiers for very high-dimensional data

Mathematics – Statistics Theory

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Published in at http://dx.doi.org/10.1214/09-AOS736 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of

Scientific paper

10.1214/09-AOS736

We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Optimal properties of centroid-based classifiers for very high-dimensional data does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Optimal properties of centroid-based classifiers for very high-dimensional data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimal properties of centroid-based classifiers for very high-dimensional data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-380001

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.