A Flexible Structured-based Representation for XML Document Mining

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

This is the authors' version. To access the final version go to the editor's site through the DOI

Scientific paper

10.1007/11766278\_34

This paper reports on the INRIA group's approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure and content. Our approach consists of representing XML documents by a set of their sub-paths, defined according to some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can use standard methods for vocabulary reduction, and simple clustering methods such as K-means that scale well. We actually use an implementation of the clustering algorithm known as "dynamic clouds" that can work with distinct groups of independent variables put in separate variables. This is useful in our model since embedded sub-paths are not independent: we split potentially dependant paths into separate variables, resulting in each of them containing independant paths. Experiments with the INEX collections show good results for the structure-only collections, but our approach could not scale well for large structure-and-content collections.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

A Flexible Structured-based Representation for XML Document Mining does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with A Flexible Structured-based Representation for XML Document Mining, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Flexible Structured-based Representation for XML Document Mining will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-449985

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.