Query-log mining for detecting spam
Source:
Fourth International Workshop on Adversarial Information Retrieval on the Web, ACM Press, Beijing, China (2008)
ISBN:
978-1-60558-159-0
URL:
http://airweb.cse.lehigh.edu/2008/submissions/castillo_2008_query_log_detection_spam.pdf
Abstract:
Every day millions of users search for information on the web via
search engines, and provide implicit feedback to the results shown
for their queries by clicking or not onto them. This feedback is
encoded in the form of a query log that consists of a sequence of
search actions, one per user query, each describing the following
information: (i) terms composing a query, (ii) documents returned
by the search engine, (iii) documents that have been clicked, (iv)
the rank of those documents in the list of results, (v) date and time
of the search action/click, (vi) an anonymous identifier for each
session, and more.
In this work, we investigate the idea of characterizing the documents
and the queries belonging to a given query log with the goal
of improving algorithms for detecting spam, both at the document
level and at the query level.