Statistics – Methodology
Scientific paper
2011-04-14
Statistics
Methodology
22 pages, 5 figures
Scientific paper
Inspired by Random Forests (RF) in the context of classification, we propose a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure $\kappa$. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis shows that the $\kappa$ criterion is shown to grow each local clustering in a desirable way---it is "noise-resistant." A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.
Chen Aiyou
Jordan Michael I.
Yan Donghui
No associations
LandOfFree
Cluster Forests does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Cluster Forests, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cluster Forests will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-8032