Computer Science – Information Retrieval
Scientific paper
2005-08-04
Dans 5\`{e}me Journ\'{e}es d' Extraction et de Gestion des Connaissances (EGC 2005)
Computer Science
Information Retrieval
Cette version corrige des erreurs dans le nom de 2 auteurs cites dans la bibliographie
Scientific paper
This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.
Despeyroux Thierry
Lechevallier Yves
Trousse Brigitte
Vercoustre Anne-Marie
No associations
LandOfFree
Expériences de classification d'une collection de documents XML de structure homogène does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Expériences de classification d'une collection de documents XML de structure homogène, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Expériences de classification d'une collection de documents XML de structure homogène will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-614151