The structure of broad topics on the Web

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

PDF, HTML, LaTeX source, all images

Scientific paper

The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of content-based clusters and communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measure the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

The structure of broad topics on the Web does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with The structure of broad topics on the Web, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and The structure of broad topics on the Web will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-494789

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.