An Improved Approximation Algorithm for the Column Subset Selection Problem

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

17 pages; corrected a bug in the spectral norm bound of the previous version

Scientific paper

We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$ matrix $C$ consisting of exactly $k$ columns of $A$. In the first (randomized) stage, the algorithm randomly selects $\Theta(k \log k)$ columns according to a judiciously-chosen probability distribution that depends on information in the top-$k$ right singular subspace of $A$. In the second (deterministic) stage, the algorithm applies a deterministic column-selection procedure to select and return exactly $k$ columns from the set of columns selected in the first stage. Let $C$ be the $m \times k$ matrix containing those $k$ columns, let $P_C$ denote the projection matrix onto the span of those columns, and let $A_k$ denote the best rank-$k$ approximation to the matrix $A$. Then, we prove that, with probability at least 0.8, $$ \FNorm{A - P_CA} \leq \Theta(k \log^{1/2} k) \FNorm{A-A_k}. $$ This Frobenius norm bound is only a factor of $\sqrt{k \log k}$ worse than the best previously existing existential result and is roughly $O(\sqrt{k!})$ better than the best previous algorithmic result for the Frobenius norm version of this Column Subset Selection Problem (CSSP). We also prove that, with probability at least 0.8, $$ \TNorm{A - P_CA} \leq \Theta(k \log^{1/2} k)\TNorm{A-A_k} + \Theta(k^{3/4}\log^{1/4}k)\FNorm{A-A_k}. $$ This spectral norm bound is not directly comparable to the best previously existing bounds for the spectral norm version of this CSSP. Our bound depends on $\FNorm{A-A_k}$, whereas previous results depend on $\sqrt{n-k}\TNorm{A-A_k}$; if these two quantities are comparable, then our bound is asymptotically worse by a $(k \log k)^{1/4}$ factor.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

An Improved Approximation Algorithm for the Column Subset Selection Problem does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with An Improved Approximation Algorithm for the Column Subset Selection Problem, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Improved Approximation Algorithm for the Column Subset Selection Problem will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-605041

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.