Publication

Diversifying Image Search with User Generated Content

Source:

Conference on Multimedia Information Retrieval, Vancouver, British Columbia (2008)

Keywords:

pseudo-relevance feedback, diversity, image retrieval, Flickr, retrieval performance, ambiguity

Abstract:

Large-scale image retrieval on the Web relies on the avail- ability of short snippets of text associated with the image. This user-generated content is a primary source of infor- mation about the content and context of an image. While traditional information retrieval models focus on nding the most relevant document without consideration for diversity, image search requires results that are both diverse and rele- vant. This is problematic for images because they are repre- sented very sparsely by text, and as with all user-generated content the text for a given image can be extremely noisy.

The contribution of this paper is twofold. First, we present a retrieval model which provides diverse results as a property of the model itself, rather than in a post-retrieval step. Rele- vance models o er a uni ed framework to a ord the greatest diversity without harming precision. Second, we show that it is possible to minimize the trade-o between precision and diversity, and estimating the query model from the distribu- tion of tags favors the dominant sense of a query. Relevance models operating only on tags o ers the highest level of di- versity with no signi cant decrease in precision.

Download: