The terms searched most frequently by web users.#

Zbigniew Koziol, softquake@gmail.com

#First published in 2000.

This page contains words as entered by real internet users to search forms in several search engines. Some words might be thought offensive for any reason for some of you. Please consider leaving this page and not reading its content if this is your case.

The data presented here have been obtained by analysing a large amount of queries entered at several major search engines:

    SearchUK
    ByteSearch
    SavvySearch
    McKinley 

These sites have been monitored non-stop for approximately 10 days and the information gathered from there (several hundreds of MB of text datafile) has been analyzed by custom developed scripts in Perl. The following procedure has been implemented:

  1. All the words have been converted to lowercase.
  2. Queries concisting of many words have been split to separate words.
  3. Certain special characters have been converted to usual ASCII ones.
  4. Some small amount of garbage has been removed.

In result, a list of 640,000 distinct search terms has been created.

The presented database is, to the best of my knowledge, the largest one available at the moment for free and it has a great value for everyone treating seriously advertising on the internet. Here is a listing of the number of searches for the most frequently queried terms, at the time research:

67036	mp3
43627	free
41132	find
15635	download
15096	software
14715	sex
14268	web
13516	information
12740	music
12442	car
11564	online
10661	internet
10632	computer
10568	pictures
10417	home
9926	video
9765	search
9698	games


Fig. 1. Number of distinct terms searched as a function of the number of searches performed, for 640,000 distinct search terms. A simple power low dependence fits well to the data.



Figure 1 shows the depandance of the number of distinct search terms on frequency of searching for these terms. For instance, we find there that the term video has been searched 9926. No other term has been searched this number of times. Hence, there is one term only that has been searched 9926 times. A point corresponding to this term is located on this figure at the bottom of vertical axis and on the right of horizontal axis. Another example: The terms schools, cheap, and camera have all been searched 2914 times. A point corresponding to these terms will have a value of 3 on vertical axis and a value 2914 on horizontal axis.



Fig. 2. Number of searches performed as a function of rank (the order, popularity) of a given search term.



Another way of presenting the data is drawing dependence of the number of searches performed as a function of rank (the order, popularity) of a given search term, as shown in Figure 2.



Fig. 3. Stretched-exponential-like dependence of the number of searches performed as a function of rank.



Figure 3 illustrates that data shown in Figure 2 can be well fitted by a straigt line if we use the rank to certain power on log-horizontal axes. In this case we see that the number of searches is proportional to exp(a*r0.2), where r is the rank of a search term and a is a certain coefficient.