Hierarchical Multiclass Decompositions with Application to Authorship Determination

Computer Science – Artificial Intelligence

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

This paper is mainly concerned with the question of how to decompose multiclass classification problems into binary subproblems. We extend known Jensen-Shannon bounds on the Bayes risk of binary problems to hierarchical multiclass problems and use these bounds to develop a heuristic procedure for constructing hierarchical multiclass decomposition for multinomials. We test our method and compare it to the well known "all-pairs" decomposition. Our tests are performed using a new authorship determination benchmark test of machine learning authors. The new method consistently outperforms the all-pairs decomposition when the number of classes is small and breaks even on larger multiclass problems. Using both methods, the classification accuracy we achieve, using an SVM over a feature set consisting of both high frequency single tokens and high frequency token-pairs, appears to be exceptionally high compared to known results in authorship determination.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Hierarchical Multiclass Decompositions with Application to Authorship Determination does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Hierarchical Multiclass Decompositions with Application to Authorship Determination, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hierarchical Multiclass Decompositions with Application to Authorship Determination will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-658824

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.