An Empirical Evaluation of a Distributed Clustering-Based Index for Metric Space Databases
Source:
International Workshop on Similarity Search and Applications (SISAP 2008), IEEE-CS Press, April 11-12, Mexico (2008)
Abstract:
Similarity search has been proved suitable for sear-
ching in very large collections of unstructured data
objects. We are interested in efficient parallel query
processing under situations of continuous streams of
queries as in search engines. A number of sequential
index data structures for this purpose have been pro-
posed so far. This paper focuses on one representative
of a class of these data structures, namely one based
on clustering for which we evaluate different ways of
distributing the index to support parallelism on a set of
processors. Our study reveals that the intuitive method
for both data distribution and model of computing are
not efficient in practice. The best results are obtained
with a strategy that appears to be more costly in con-
struction but we show that in practice this cost is not
significant.