Computer Science – Computation and Language
Scientific paper
2001-06-17
Proceedings of the 24th SIGIR, pp. 154--162, 2001.
Computer Science
Computation and Language
To appear in the proceedings of SIGIR 2001. 11 pages
Scientific paper
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.
Ando Rie Kubota
Lee Lillian
No associations
LandOfFree
Iterative Residual Rescaling: An Analysis and Generalization of LSI does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Iterative Residual Rescaling: An Analysis and Generalization of LSI, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Iterative Residual Rescaling: An Analysis and Generalization of LSI will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-239975