Analyse spectrale des textes: détection automatique des frontières de langue et de discours

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

In French. 10 pages, 5 figures, LaTeX 2e using EPSF and custom package taln2006.sty (designed by Pierre Zweigenbaum, ATALA). P

Scientific paper

We propose a theoretical framework within which information on the vocabulary of a given corpus can be inferred on the basis of statistical information gathered on that corpus. Inferences can be made on the categories of the words in the vocabulary, and on their syntactical properties within particular languages. Based on the same statistical data, it is possible to build matrices of syntagmatic similarity (bigram transition matrices) or paradigmatic similarity (probability for any pair of words to share common contexts). When clustered with respect to their syntagmatic similarity, words tend to group into sublanguage vocabularies, and when clustered with respect to their paradigmatic similarity, into syntactic or semantic classes. Experiments have explored the first of these two possibilities. Their results are interpreted in the frame of a Markov chain modelling of the corpus' generative processe(s): we show that the results of a spectral analysis of the transition matrix can be interpreted as probability distributions of words within clusters. This method yields a soft clustering of the vocabulary into sublanguages which contribute to the generation of heterogeneous corpora. As an application, we show how multilingual texts can be visually segmented into linguistically homogeneous segments. Our method is specifically useful in the case of related languages which happened to be mixed in corpora.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Analyse spectrale des textes: détection automatique des frontières de langue et de discours does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Analyse spectrale des textes: détection automatique des frontières de langue et de discours, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Analyse spectrale des textes: détection automatique des frontières de langue et de discours will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-557322

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.