Decoding the structure of the WWW: facts versus sampling biases

Computer Science – Networking and Internet Architecture

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages 19 figures. Values in Table 2 and Figure 1 corrected. Figure 7 updated. Minor changes in the text

Scientific paper

The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this paper, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In order to provide a more accurate characterization of the Web graph and identify observables which are clearly discriminating with respect to the sampling process, we study the behavior of degree-degree correlation functions and the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of theWeb. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Decoding the structure of the WWW: facts versus sampling biases does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Decoding the structure of the WWW: facts versus sampling biases, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Decoding the structure of the WWW: facts versus sampling biases will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-103865

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.