Curse of Dimensionality in Pivot-based Indexes

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

9 pp., 4 figures, latex 2e, a revised submission to the 2nd International Workshop on Similarity Search and Applications, 2009

Scientific paper

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples $X_d$ picked in i.i.d. fashion from metric spaces $\Omega_d$. We allow the size of the dataset $n=n_d$ to be such that $d$, the ``dimension'', is superlogarithmic but subpolynomial in $n$. The number of pivots is allowed to grow as $o(n/d)$. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces $\Omega_d$ in the sense of concentration of measure phenomenon is $O(d)$, then the performance of similarity search pivot-based indexes is asymptotically linear in $n$.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Curse of Dimensionality in Pivot-based Indexes does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Curse of Dimensionality in Pivot-based Indexes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Curse of Dimensionality in Pivot-based Indexes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-234327

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.