Learning to Tag and Tagging to Learn: A Case Study on Wikipedia
Source:
IEEE Intelligent Systems, Volume 23, Issue 5 (2008)
Abstract:
Natural language technologies have been long envisioned to play a
crucial role in transitioning from the current Web to a more ``semantic''
Web. If anything, the significance of textual content on the Web has
only increased with the rise of Web 2.0 and mass participation in
content generation, which comes mostly in the form of text. Yet,
natural language technologies face significant challenges
in dealing with the heterogeneity of Web content: specifically,
the accuracy of systems trained on one corpus for a specific task
degrades considerably when either the domain or task changes. In
this paper, we consider the problem of semantically annotating
Wikipedia. We investigate a method for dealing with domain and task
adaptation of semantic taggers in cases where parallel text and
metadata are available. By creating a semantic mapping among
vocabularies from two sources: Wikipedia and the original annotated
corpus, we are able to improve our tagger on the Wikipedia. Moreover,
by applying our tagger and mapping between sources
we are able to significantly extend the metadata currently
available in the DBpedia collection.
Download: