Wikipedia:Size comparisons

This is an old revision of this page, as edited by 217.118.43.66 (talk) at 23:19, 6 May 2004 (fixed typo). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This article compares the size of Wikipedia with other encyclopedias and information collections.

Source material from which Wikipedia statistics in this article are derived is available here; the Footnote on WikiStatistics section at the end of this page provides technical discussion of this article.

At the start of May 2004, the English Wikipedia alone had 239,000 articles of 200 characters of greater, and the combined Wikipedias for all languages exceeded the English Wikipedia in size, giving an combined total of 582,000 articles in 82 languages. The English Wikipedia alone has now reached 79 million words in size, comfortably eclipsing the largest previously existing encyclopedias.

Overall article growth continues to increase exponentially, but at a rate less than originally estimated. This analysis suggests that the "organic" hand-edited growth rate of a mature Wikipedia is slightly less than doubling growth per year, with younger Wikipedias growing somewhat faster.

Not bad for an encyclopedia which is only three years old. But we must do better! Many of the articles are still of poor quality, and the average article length is only a little over half the size of that in Encyclopædia Britannica. As the Wikipedia grows more comprehensive, efforts are expected to move more towards increasing the quality, scope and interlinkage of existing articles, rather than the creation of new articles. It is also anticipated that the Wikipedia will grow to include a global gazeteer as part of its function.

See Wikipedia:Modelling Wikipedia's growth for more educated guesses about the potential growth of Wikipedia.

Comparison of Encyclopedias

All Word to number of letters calculations are done on the basis of an average word length of five, plus a space (5+1) = 6 characters per word.

  • On the 1st May 2004, the English language Wikipedia had 239,000 articles, 79 million words, 474 million characters, giving a mean article length of 331 words. It also had 53,000 photographs and illustrations, 155,000 redirect pages (think of them as additional index entries in the form for BBC see British Broadcasting Corporation), and a staggering 4.4 million links between pages.
  • The Columbia Encyclopedia, Sixth Edition, is cited as having 51,000 articles and 6.5 million words. This gives 39 million characters and a mean article length of 127 words.
  • Grolier Multimedia Encyclopedia Online claims 11 million words and 39,200 articles, giving 66 million characters and a mean article length of 281 words.
  • Microsoft Encarta Deluxe 2002 is cited as having "over 60,000 articles, 10,000 historical archives, and over 40 million words", giving 156 million characters and a mean article plus archive length of 371 words.
  • The advertisements for Encyclopædia Britannica's 2002 edition proudly proclaim they have over 85,000 articles. A claimed word count of 55 million words gives an estimated 330 million characters, and a mean article length of 647 words.
  • Microsoft's Encarta Encyclopedia 2002 is cited as having 26 million words.
  • The Encyclopédie had 75,000 entries.

Sizes of other non-encyclopedia information collections

Sizes of other non-encyclopedia information collections, for comparison. Note that Wikipedia is neither a dictionary, nor a web index: these figures are just for order-of-magnitude comparison.

Numbers to which Wikipedia aspires!

  • As of 2003, there are about six billion human beings, each with their own life story. Billions more have lived and died in the past, although most of their lives are lost to history.
  • and finally: It is accepted by astrophysicists that the number of particles in the observable universe is currently in the 1085 range - much less than a googol (1 with a 100 zeroes after it).


Footnote on WikiStatistics

Very detailed statistics for all almost all aspects of Wikipedia are available from http://www.wikipedia.org/wikistats/EN/Sitemap.htm.

Statistics for this page are taken from the Article count (alternate) table and from the Words table.

Note that the calculation number of words divided by number of articles is an oversimplification which leads to a very slightly exaggerated number of words per article. Excluding redirect pages, there are:

  • 239,000 articles that have at least a single link and 200 readable characters (about 33 words or more).
  • 261,000 articles that have at least a single link but which may be less than 200 characters (about 33 words or less).

Taking the difference of these two figures, there are at least

  • 22,000 articles with a single link and less than 200 characters

and there are an uncounted number of articles which have no link, irrespective of their size. The current statistics (appear to) provide no guide as to the size of this last category of articles. The upshot is that the 79 million words in fact covers the 239,000 bona fide articles, and the rest of the articles - the 22,000 that are less than 200 characters and the uncounted number of articles without links. A guestimate is that all these will occupy 2 million words (e.g. 22k times 33 words = 726k words, and allow 1.24 M for all articles with no links), and so the mean article length in words should be 77 M words divided by 239 k articles = 322 words, instead of the 331 quoted above, representing a 2.78% overestimate.

Further, of the articles on the English Wikipedia, perhaps 36,000 are "data dumped" gazeteer entries about towns and cities in the USA. There is still discussion as to whether gazeteer entries should count towards the number of "real" encyclopedia articles; equally, their statistical significance is very much less now than in October 2002 when they were added; and very many have been colonised by Wikipedians who have transformed them to varying extents, including to an unimpeachably encyclopaedic status.

See also: