Revisiting k-means: New Algorithms via Bayesian Nonparametrics

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

18 pages

Scientific paper

One of the many benefits of Bayesian nonparametric processes such as the Dirichlet process is that they can be used for modeling infinite mixture models, thus providing a flexible answer to the question of how many clusters exist in a data set. For the most part, such flexibility is currently lacking in techniques based on hard clustering, such as k-means, graph cuts, and Bregman hard clustering. For finite mixture models, there is a precise connection between k-means and mixtures of Gaussians, obtained by an appropriate limiting argument. In this paper, we apply a similar technique to an infinite mixture arising from the Dirichlet process (DP). We show that a Gibbs sampling algorithm for DP mixtures approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like objective that includes a penalty term based on the number of clusters. We generalize our analysis to the case of clustering multiple related data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We discuss additional extensions that further highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that requires O(|E|) time per iteration and automatically determines the number of clusters in a graph.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Revisiting k-means: New Algorithms via Bayesian Nonparametrics does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Revisiting k-means: New Algorithms via Bayesian Nonparametrics, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Revisiting k-means: New Algorithms via Bayesian Nonparametrics will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-327280

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.