Artificial Sequences and Complexity Measures

Physics – Condensed Matter – Statistical Mechanics

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A

Scientific paper

10.1088/1742-5468/2005/04/P04002

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Artificial Sequences and Complexity Measures does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Artificial Sequences and Complexity Measures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Artificial Sequences and Complexity Measures will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-180439

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.