Large Margin Taxonomy Embedding with an Application to Document Categorization
Source:
Neural Information Processing Systems (NIPS), Volume 21 (2008)
Keywords:
classification, embedding, taxonomy
Abstract:
Applications of multi-class classification, such as document categorization, often
appear in cost-sensitive settings. Recent work has significantly improved the state
of the art by moving beyond “flat” classification through incorporation of class
hierarchies [4]. We present a novel algorithm that goes beyond hierarchical clas-
sification and estimates the latent semantic space that underlies the class hierarchy.
In this space, each class is represented by a prototype and classification is done
with the simple nearest neighbor rule. The optimization of the semantic space
incorporates large margin constraints that ensure that for each instance the correct
class prototype is closer than any other. We show that our optimization is convex
and can be solved efficiently for large data sets. Experiments on the OHSUMED
medical journal data base yield state-of-the-art results on topic categorization.
Download: