Clustering with Spectral Norm and the k-means Algorithm

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Clustering with Spectral Norm and the k-means Algorithm Clustering with Spectral Norm and the k-means Algorithm

: 2010-04-11
: arxiv.org/abs/1004.1823v1
: Computer Science
: Data Structures and Algorithms

: Scientific paper
: There has been much progress on efficient algorithms for clustering data points generated by a mixture of $k$ probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least $\Omega(k)$ standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a "proximity condition": the projection of any data point onto the line joining its cluster center to any other cluster center is $\Omega(k)$ standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent the difference between a point and the mean of the cluster to which it belongs. We show that in the generative models studied, our proximity condition is satisfied and so we are able to derive most known results for generative models as corollaries of our main result. We also prove some new results for generative models - e.g., we can cluster all but a small fraction of points only assuming a bound on the variance. Our algorithm relies on the well known $k$-means algorithm, and along the way, we prove a result of independent interest -- that the $k$-means algorithm converges to the "true centers" even in the presence of spurious points provided the initial (estimated) centers are close enough to the corresponding actual centers and all but a small fraction of the points satisfy the proximity condition. Finally, we present a new technique for boosting the ratio of inter-center separation to standard deviation.

Affiliated with

Kannan Ravindran

Computer Science – Data Structures and Algorithms

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kumar Amit

Computer Science – Data Structures and Algorithms

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Clustering with Spectral Norm and the k-means Algorithm does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Clustering with Spectral Norm and the k-means Algorithm, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Clustering with Spectral Norm and the k-means Algorithm will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-187056

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure