Publication

A Statistical View of Binned Retrieval Models

Source:

European Conference on Information Retrieval (ECIR), p.175-186 (2008)

Abstract:

Parameterized retrieval models are the most commonly studied type of information retrieval model. These models assign term weights according to a parameterized function, composed of standard information retrieval features, such as term frequency, inverse document frequency, and document length. The recently proposed document-centric impact model has suggested that very simple term weighting functions can be as effective as the more complex weighting functions. In this work, we describe a probabilistic model that is inspired by the document-centric impact model. In addition, we propose novel techniques for binning terms and estimating probabilities discriminatively. We analyze various aspects of our model and show that it is competitive in terms of effectiveness and efficiency when evaluated against several TREC data sets.