Spectral clustering based on local linear approximations

Statistics – Machine Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

10.1214/11-EJS651

In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a prototype for a higher-order spectral clustering method based on the residual from a local linear approximation. We obtain theoretical guarantees for this algorithm and show that, in terms of both separation and robustness to outliers, it outperforms the standard spectral clustering algorithm (based on pairwise distances) of Ng, Jordan and Weiss (NIPS '01). The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our theoretical purposes. We also discuss the cases of clusters of mixed dimensions and of clusters that are generated from smoother surfaces. In our experiments, this algorithm is shown to outperform pairwise spectral clustering on both simulated and real data.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Spectral clustering based on local linear approximations does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Spectral clustering based on local linear approximations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Spectral clustering based on local linear approximations will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-502911

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.