Computer Science – Data Structures and Algorithms
Scientific paper
2011-10-13
Computer Science
Data Structures and Algorithms
19 pages
Scientific paper
We study the topic of dimensionality reduction methods for k-means clustering. Dimensionality reduction encompasses the union of two approaches; feature selection and feature extraction. First, feature selection selects a small subset of actual features from the data and then runs the clustering algorithm only on the selected features. Second, feature extraction constructs a small set of new artificial features and then runs the clustering algorithm only on the constructed features. Despite the significance of the problem as well as the wealth of heuristic methods addressing it there exist no provably accurate feature selection methods. On the other hand, two provably accurate feature extraction methods for k-means exist: the first one is randomized and is based on Random Projections; the other, is deterministic and it is based on the Singular Value Decomposition. This paper addresses this shortcoming by presenting the first provably accurate feature selection method for k-means clustering. We also present two novel feature extraction methods: the first one is based on Random Projections and improves the existing result in terms of speed and number of features needed to be extracted; the other is based on fast approximate SVD factorizations and improves the existing result in terms of speed. All three methods of our work are randomized and, with constant probability, provide constant-factor approximation guarantees with respect to the optimal k-means objective value.
Boutsidis Christos
Drineas Petros
Mahoney Michael W.
Zouzias Anastasios
No associations
LandOfFree
Stochastic Dimensionality Reduction for K-means Clustering does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Stochastic Dimensionality Reduction for K-means Clustering, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stochastic Dimensionality Reduction for K-means Clustering will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-500120