Dynamic characterization of a large Web graph
Source:
Procs. of the 1st Web Science Conference (short paper), Athen (2009)
Abstract:
The Web is characterized by an extremely dynamic nature, as it is proved by the rapid and significant growth it has experimented in the last decade
and by its continuous evolution through creation or deletion of pages and hyperlinks. Consequently, analyzing the temporal evolution of the Web has become a crucial task that can provide search engines with valuable information for refining crawling policies, improving ranking models or detecting spam.
In this paper we study a temporal dataset made of twelve 100M pages snapshots of the .uk domain.
We analyze the data at the level of interconnection between hosts, studying the temporal evolution of 3,500 sites with respect to a number of topological properties, including degrees, number of degree supporters and eigenvector distributions.
Our results show that a major fraction of the sites exhibit a very stable behavior. However, a non negligible percentage of hosts is characterized by increasing or decreasing evolution patterns.
Interestingly, we observe positive correlations both in the growth of different properties for the same node and in the temporal evolution of sites that are neighbors in the host graph.