Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

(postprint); This version corrects a couple of errors in authors' names in the bibliography

Scientific paper

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-374762

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.