Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, US

Scientific paper

Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigned into the same topic are semantically related. In this paper, we propose a novel hierarchical classification method based on a topic model and by integrating additional term features from neighboring pages. Our hierarchical classification method consists of two phases: (1) feature representation by using a topic model and integrating neighboring pages, and (2) hierarchical Support Vector Machines (SVM) classification model constructed from a confusion matrix. From the experimental results, the approach of using the proposed hierarchical SVM model by integrating current page with neighboring pages via the topic model yielded the best performance with the accuracy equal to 90.33% and the F1 measure of 90.14%; an improvement of 5.12% and 5.13% over the original SVM model, respectively.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-668359

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.