Information flow and the links structure of the web

By Zbigniew Koziol

Everyone knows that the Internet has an infrastructure that is not governed by any institutions. Well, it was not in the past, as long as it was available to military and educational institutions only. Once ordinary people became regular users, and corporations began to invest more money, governments became concerned about security and too much freedom in the exchange of information among citizens.

This time, however, I will not write about institutional control of the functioning of the Internet. Let us examine the hidden structure of the web that arises from its nature: the structure that has not been organized by anybody and follows from the way web authors create their web pages and link them together. The results shown here were announced at the Ninth International World Wide Web Conference, which was held in May, in Amsterdam. The study was carried out by AltaVista, Compaq, and IBM. More than 1.5 billion links on over 200 million pages were analyzed.

The picture shows schematically the outcome of that enormous computational research. The colors represent various categories of web pages, and the arrows show the direction of the links between them.

The yellow portion represents the pages that are not linked by other web pages (there are 44 million of them). However, they contain links to the outside world, to web pages represented by the largest red circle, to other pages not shown on this graph (two arrows going out of the yellow portion), and they are also connected to the green portion (links represented by the bottom arrow in the middle of the graph).

The most important is the red circle - a collection of 56 million web pages that are strongly interconnected. There are links from them to the green portion, representing 44 million web pages. That green portion is also linked from other, outside pages, as represented by the two vertical arrows.

And, not surprisingly, about 10% of the web pages (blue spots on the graph) are neither connected from anywhere nor connect anywhere. In other words, they are not accessible from any other pages, and practically they are not known to anyone.

While this diagram may seem useless to the average web user, it has importance to AltaVista. The company will use this map to improve its search results.