Weighted Naive Bayes Model for Semi-Structured Document Categorization

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

The aim of this paper is the supervised classification of semi-structured data. A formal model based on bayesian classification is developed while addressing the integration of the document structure into classification tasks. We define what we call the structural context of occurrence for unstructured data, and we derive a recursive formulation in which parameters are used to weight the contribution of structural element relatively to the others. A simplified version of this formal model is implemented to carry out textual documents classification experiments. First results show, for a adhoc weighting strategy, that the structural context of word occurrences has a significant impact on classification results comparing to the performance of a simple multinomial naive Bayes classifier. The proposed implementation competes on the Reuters-21578 data with the SVM classifier associated or not with the splitting of structural components. These results encourage exploring the learning of acceptable weighting strategies for this model, in particular boosting strategies.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Weighted Naive Bayes Model for Semi-Structured Document Categorization does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Weighted Naive Bayes Model for Semi-Structured Document Categorization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Weighted Naive Bayes Model for Semi-Structured Document Categorization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-456428

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.