Activity of mailing lists users - a mathematical approach.

By Zbigniew Kozioł

When George Kingsley Zipf, professor of linguistic at Harvard found in forties that frequency of words in English texts can be described in a very simple manner, by using a power law, no one did expect that the same or a similar mathematical dependence may be used to analyze a broad range of phenomena in nature and in social sciences.

At present, the so called Zipf's law or it's generalization using stretched-exponential function has been applied to describe for instance the following:

- the distributions of radio and light emissions from galaxies,
- world, U.S., and French agglomeration sizes, and also country population sizes,
- Vostok temperature variations,
- frequency of citations of the most cited physicists in the world,
- income or revenue of companies,
- internet stocks valuations, and many more.

Recently, a growing afford is directed to understanding of statistical regularities observed on the Internet. This is still a largely unknown and not investigated subject. It has been found till now that Zipf's law or stretched-exponential distribution fits well to the following:

- the number of pages per web site,
- the number of links a site receives,
- the amount of traffic referred to a site from other sites,
- the frequency of words searched on the Internet.

To illustrate better how the analysis is performed, let us explain a particular example of original data compiled by the author of this article. The activity of members of two mailing lists has been measured, Poland-L and APAP. Poland-L is the oldest and largest Polish mailing list (in Polish) devoted to all aspects of Polish culture and politics, and it's archives are available for browsing from a server of University of Buffalo

APAP (Association of Polish-American Professionals) is the largest mailing list of Poles and everyone interested in Polish matters where discussions are in English and archives are placed at a server of Stony Brook University.

During the time since January 1997 till June 2000, there were 28510 letters send to Poland-L and 25475 send to APAP discussion list. The number of postings of every author has been found. After that, these numbers were sorted by authors according to number of their postings. This is called sorting by rank, where rank simply enumerates authors. The frequency of postings is drawn as a function of certain power of rank (see the graph). Simply, it means that the data are described by a stretched-exponential function. In particular, as shown on the graph, for APAP the exponent used has a value of 0.45, while for Poland-L a value of 0.6. It is difficult to say what is the mathematical meaning of these exponents since there is no good theory describing such a complex social system as a mailing list and there is no reasearch known on that.

Nevertheless, one may find some interesting conclusions by looking closer to the results.

1. A very small number of participants of mailing lists posts a lot of letters, (3-5 of them post 10-20% of all messages!).

2. At the same time, a large part (30-50%) of participants is not active at all!

These observations become very important when we realize that they describe well a broad range of social and economical phenomena. For instance, similar conclusions are valid also when we look to the number of visits to various web sites - very few of them are visited frequently (for instance, almost 50% of all searches are performed on Yahoo!). Also, it seems very likely that political activities of members of a democratic society would be described in a similar way: very few of them only have the political power and influence while the political role of majority is essentially passive and insignificant.