Computer Science – Information Theory
Scientific paper
2010-12-17
Computer Science
Information Theory
7 pages, 4 figures, longer version for DCC 2011 paper
Scientific paper
We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over character-based PPM, especially in low order cases.
Hu Yichuan
Jianzhong
Khan Farooq
Li Ying
Zhang
No associations
LandOfFree
Improving PPM Algorithm Using Dictionaries does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Improving PPM Algorithm Using Dictionaries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Improving PPM Algorithm Using Dictionaries will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-637742