Publication

Learning to Tag and Tagging to Learn: A Case Study on Wikipedia

Source:

IEEE Intelligent Systems, Volume 23, Issue 5 (2008)

Abstract:

Natural language technologies have been long envisioned to play a crucial role in transitioning from the current Web to a more ``semantic'' Web. If anything, the significance of textual content on the Web has only increased with the rise of Web 2.0 and mass participation in content generation, which comes mostly in the form of text. Yet, natural language technologies face significant challenges in dealing with the heterogeneity of Web content: specifically, the accuracy of systems trained on one corpus for a specific task degrades considerably when either the domain or task changes. In this paper, we consider the problem of semantically annotating Wikipedia. We investigate a method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. By creating a semantic mapping among vocabularies from two sources: Wikipedia and the original annotated corpus, we are able to improve our tagger on the Wikipedia. Moreover, by applying our tagger and mapping between sources we are able to significantly extend the metadata currently available in the DBpedia collection.

Download: