A Question of Relevancy
Search engine users are typically most interested in the items returned on the first page of the search results. It’s not often that users dig down into the ninth or tenth pages, because it takes too long and those results are simply not as relevant.
Given this fact, it seems appropriate for a ranking algorithm to spend most of its modeling efforts getting the topmost items right.
Though the current algorithms used by Yahoo! do a very good job at determining the relevance of a web page for a particular query, there is always room for improvement. That’s why Olivier Chapelle, a senior research scientist at Yahoo!, has spent the last several months trying to boost the ranking quality.
His work is based on the machine learning framework of structured output learning, where the input corresponds to a set of documents and the output is a ranking. This approach is different from the regression model commonly used in current search engine technology.
In essence, the framework of structured output learning provides a new opportunity: instead of viewing the outputs of the documents in an independent fashion, they are now coupled together in order to optimize the performance measure.
Why is this potentially a better approach? Well, by considering the documents independently, the regression model is unaware of the global ranking. By contrast, structured output learning can find a rule that produces a better overall ranking by taking into account all the documents associated with a query.
Indeed, in early tests, Chapelle’s algorithm has yielded 3 to 4 percent improved accuracy rates on several public and commercial ranking datasets. These results are featured in a paper entitled Large Margin Optimization of Ranking Measures, which Chapelle and his co-author.
Though excited by his initial results, Chapelle admits there is still a long way to go. "The next step is to do more systematic experiments to validate the usefulness of this method," he says.
And if everything goes according to plan? "The long-term hope is eventually this model will be put in production and used for all searches on Yahoo!," says Chapelle.