The impact of caching on search engines
Source:
ACM SIGIR Conference on Research and Devenopement in Information Retrieval (SIGIR) (2007)
Abstract:
In this paper we study the trade-offs in designing efficient caching
systems for Web search engines.
We explore the impact of different approaches,
such as static vs. dynamic caching, and
caching query results vs. caching posting lists.
Using a query log spanning a whole year we explore the limitations of
caching and we demonstrate that caching posting lists can achieve higher
hit rates than caching query answers.
We propose a new algorithm for static caching of posting lists,
which outperforms previous methods.
We also study the problem of finding the optimal way to
split the static cache between answers and posting lists.
Finally, we measure how the changes in the query log affect the
effectiveness of static caching, given our observation that the
distribution of the queries changes slowly over time.
Our results and observations are applicable to different levels
of the data-access hierarchy, for instance, for a memory/disk layer or
a broker/remote server layer.