Evaluating the Impact of Information Distortion on Normalized Compression Distance

Computer Science – Information Theory

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

5 pages, 9 figures. Submitted to the ICMCTA 2008

Scientific paper

In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications in them. We measure how the information contained in each book is maintained using a clustering error measure. We find experimentally that the best way to keep the clustering error is by means of modifications in the most frequent words. We explain the details of these information distortions and we compare with other kinds of modifications like random word distortions and unfrequent word distortions. Finally, some phenomenological explanations from the different empirical results that have been carried out are presented.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Evaluating the Impact of Information Distortion on Normalized Compression Distance does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Evaluating the Impact of Information Distortion on Normalized Compression Distance, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Evaluating the Impact of Information Distortion on Normalized Compression Distance will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-171138

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.