Publication

An Empirical Evaluation of a Distributed Clustering-Based Index for Metric Space Databases

Source:

International Workshop on Similarity Search and Applications (SISAP 2008), IEEE-CS Press, April 11-12, Mexico (2008)

Abstract:

Similarity search has been proved suitable for sear- ching in very large collections of unstructured data objects. We are interested in efficient parallel query processing under situations of continuous streams of queries as in search engines. A number of sequential index data structures for this purpose have been pro- posed so far. This paper focuses on one representative of a class of these data structures, namely one based on clustering for which we evaluate different ways of distributing the index to support parallelism on a set of processors. Our study reveals that the intuitive method for both data distribution and model of computing are not efficient in practice. The best results are obtained with a strategy that appears to be more costly in con- struction but we show that in practice this cost is not significant.