Statistics – Machine Learning
Scientific paper
2007-06-25
Electronic Journal of Statistics 2009, Vol. 3, 76-113
Statistics
Machine Learning
Published in at http://dx.doi.org/10.1214/08-EJS289 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by t
Scientific paper
10.1214/08-EJS289
In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
Blum Avrim
Coja-Oghlan Amin
Frieze Alan
Zhou Shuheng
No associations
LandOfFree
Separating populations with wide data: A spectral analysis does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Separating populations with wide data: A spectral analysis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Separating populations with wide data: A spectral analysis will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-229558