Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hierarchical Multiclass Decompositions with Application to Authorship Determination

Published 11 Oct 2010 in cs.AI | (1010.2102v1)

Abstract: This paper is mainly concerned with the question of how to decompose multiclass classification problems into binary subproblems. We extend known Jensen-Shannon bounds on the Bayes risk of binary problems to hierarchical multiclass problems and use these bounds to develop a heuristic procedure for constructing hierarchical multiclass decomposition for multinomials. We test our method and compare it to the well known "all-pairs" decomposition. Our tests are performed using a new authorship determination benchmark test of machine learning authors. The new method consistently outperforms the all-pairs decomposition when the number of classes is small and breaks even on larger multiclass problems. Using both methods, the classification accuracy we achieve, using an SVM over a feature set consisting of both high frequency single tokens and high frequency token-pairs, appears to be exceptionally high compared to known results in authorship determination.

Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.