Computer Science – Learning
Scientific paper
2011-11-02
Computer Science
Learning
18 pages
Scientific paper
One of the many benefits of Bayesian nonparametric processes such as the Dirichlet process is that they can be used for modeling infinite mixture models, thus providing a flexible answer to the question of how many clusters exist in a data set. For the most part, such flexibility is currently lacking in techniques based on hard clustering, such as k-means, graph cuts, and Bregman hard clustering. For finite mixture models, there is a precise connection between k-means and mixtures of Gaussians, obtained by an appropriate limiting argument. In this paper, we apply a similar technique to an infinite mixture arising from the Dirichlet process (DP). We show that a Gibbs sampling algorithm for DP mixtures approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like objective that includes a penalty term based on the number of clusters. We generalize our analysis to the case of clustering multiple related data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We discuss additional extensions that further highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that requires O(|E|) time per iteration and automatically determines the number of clusters in a graph.
Jordan Michael I.
Kulis Brian
No associations
LandOfFree
Revisiting k-means: New Algorithms via Bayesian Nonparametrics does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Revisiting k-means: New Algorithms via Bayesian Nonparametrics, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Revisiting k-means: New Algorithms via Bayesian Nonparametrics will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-327280