First published in 2000. This page contains words as entered by real internet users to search forms in several search engines. The data presented here have been obtained by analysing a large amount of queries entered at several major search engines: SearchUK ByteSearch SavvySearch McKinley These sites have been monitored nonstop for approximately 10 days and the information gathered from there (several hundreds of MB of text datafile) has been analyzed by custom developed scripts in Perl. The following procedure has been implemented:
In result, a list of 640,000 distinct search terms has been created. The presented database is, to the best of my knowledge, the largest one available at the moment for free and it has a great value for everyone treating seriously advertising on the internet. Here is a listing of the number of searches for the most frequently queried terms, at the time research: 67036 mp3 43627 free 41132 find 15635 download 15096 software 14715 sex 14268 web 13516 information 12740 music 12442 car 11564 online 10661 internet 10632 computer 10568 pictures 10417 home 9926 video 9765 search 9698 games
Figure 1 shows the depandance of the number of distinct search terms on frequency of searching for these terms. For instance, we find there that the term video has been searched 9926. No other term has been searched this number of times. Hence, there is one term only that has been searched 9926 times. A point corresponding to this term is located on this figure at the bottom of vertical axis and on the right of horizontal axis. Another example: The terms schools, cheap, and camera have all been searched 2914 times. A point corresponding to these terms will have a value of 3 on vertical axis and a value 2914 on horizontal axis.
Another way of presenting the data is drawing dependence of the number of searches performed as a function of rank (the order, popularity) of a given search term, as shown in Figure 2.
Figure 3 illustrates that data shown in Figure 2 can be well fitted by a straigt line if we use the rank to certain power on loghorizontal axes. In this case we see that the number of searches is proportional to exp(a*r^{0.2}), where r is the rank of a search term and a is a certain coefficient.
