Publication

Characterization of national Web domains

Source:

ACM Transactions on Internet Technology, Volume 7, Number 2 (2007)

URL:

http://dx.doi.org/10.1145/1239971.1239973

Keywords:

web-characterization

Abstract:

During the last years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web, because at the same time they are diverse (as they are written by several authors) and consistent (as they share a common geographical, historical and cultural context). This paper discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections, and sheds light on how certain results of a single Web characterization study may be valid in the context of the full Web.