Statistics – Machine Learning
Scientific paper
2009-09-12
Statistics
Machine Learning
Scientific paper
In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
No associations
LandOfFree
Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-477510