This article compares the size of Wikipedia with other encyclopedias and information collections.
Source material from which Wikipedia statistics in this article are derived is available here; the Footnote on WikiStatistics section at the end of this page provides technical discussion of this article.
At the start of May 2004, the English Wikipedia alone had 239,000 articles of 200 characters of greater, and the combined Wikipedias for all languages exceeded the English Wikipedia in size, giving an combined total of 582,000 articles in 82 languages. The English Wikipedia alone has now reached 79 million words in size, comfortably eclipsing the largest previously existing encyclopedias.
Overall article growth continues to increase exponentially, but at a rate less than originally estimated. This analysis suggests that the "organic" hand-edited growth rate of a mature Wikipedia is slightly less than doubling growth per year, with younger Wikipedias growing somewhat faster.
Not bad for an encyclopedia which is only three years old. But we must do better! Many of the articles are still of poor quality, and the average article length is only a little over half the size of that in Encyclopædia Britannica. As the Wikipedia grows more comprehensive, efforts are expected to move more towards increasing the quality, scope and interlinkage of existing articles, rather than the creation of new articles. It is also anticipated that the Wikipedia will grow to include a global gazeteer as part of its function.
See Wikipedia:Modelling Wikipedia's growth for more educated guesses about the potential growth of Wikipedia.
Comparison of Encyclopedias
All Word to number of letters calculations are done on the basis of an average word length of five, plus a space (5+1) = 6 characters per word.
- On the 1st May 2004, the English language Wikipedia had 239,000 articles, 79 million words, 474 million characters, giving a mean article length of 331 words. It also had 53,000 photographs and illustrations, 155,000 redirect pages (think of them as additional index entries in the form for BBC see British Broadcasting Corporation), and a staggering 4.4 million links between pages.
- The Columbia Encyclopedia, Sixth Edition, is cited as having 51,000 articles and 6.5 million words. This gives 39 million characters and a mean article length of 127 words.
- Grolier Multimedia Encyclopedia Online claims 11 million words and 39,200 articles, giving 66 million characters and a mean article length of 281 words.
- Microsoft Encarta Deluxe 2002 is cited as having "over 60,000 articles, 10,000 historical archives, and over 40 million words", giving 156 million characters and a mean article plus archive length of 371 words.
- The advertisements for Encyclopædia Britannica's 2002 edition proudly proclaim they have over 85,000 articles. A claimed word count of 55 million words gives an estimated 330 million characters, and a mean article length of 647 words.
- Microsoft's Encarta Encyclopedia 2002 is cited as having 26 million words.
- The Encyclopédie had 75,000 entries.
Sizes of other non-encyclopedia information collections
Sizes of other non-encyclopedia information collections, for comparison. Note that Wikipedia is neither a dictionary, nor a web index: these figures are just for order-of-magnitude comparison.
- The Merck Index Subscription Edition has over 10,000 monographs on chemical compounds.
- Online Mendelian Inheritance in Man (external link) has 15255 entries, each describing a known gene, as of April 8th, 2004. site statistics
- Black's Law Dictionary 7th ed. has 24,500 common law legal terms.
- American Jurisprudence 1st ed. is an 83 vol. collection of American common law, 2nd ed. 231 volumes!
- The New Grove Dictionary of Music and Musicians, 2nd edition claims "25 million words with over 29,000 articles" about the subject of music alone
- Each Human being is estimated to have 30,000 to 40,000 genes, each of which probably deserves an article.
- The old Dictionary of National Biography had 36,500 articles in 33 million words.
- The OUP's New Dictionary of National Biography has a target size of 50,000 articles on famous Britons, in 50 million words (implying an average article size of 1000 words). If a country of 60 million people has 50,000 famous people in its history, a world of six billion people should have 5,000,000 famous people in its history.
- The New Oxford Dictionary of English claims 350,000 definitions, and four million words.
- As of March 2004, The dmoz web index claims to have over 590,000 categories (for a total of over 4.9 million websites, but the categories are what is important here).
- The freedb database holds information for around 1,296,579 compact discs.
- The World Resources Institute claims that approximately 1.4 million species have been named, out of an unknown number of total species (estimates range between 2 and 100 million species).
- as of January 2004, the Internet Movie Database claims to have records on "380,000 titles and 1.6 million names" making up a total of "over 6.3 million individual film/TV credits"
- As of March 2004, the USGS Geographic Names Information System claims to have almost 2 million physical and cultural geographic features within the United States.
- The NIMA (http://www.nima.mil/) GEOnet Names Server contains approximately 3.88 million named geographical features outside the United States, with 5.34 million names.
- The Beilstein database claims entries on "8 million organic and 1.4 million inorganic and organometallic compounds".
- Netcraft logged roughly 46 million distinct websites in January 2004
- The Library of Congress claims that it holds approximately 119 million items.
- The British Library claims that it holds over 150 million items.
- The Guide Star Catalog II has entries on 998,402,801 distinct astronomical objects
Numbers to which Wikipedia aspires!
- As of 2003, there are about six billion human beings, each with their own life story. Billions more have lived and died in the past, although most of their lives are lost to history.
- and finally: It is accepted by astrophysicists that the number of particles in the observable universe is currently in the 1085 range - much less than a googol (1 with a 100 zeroes after it).
Footnote on WikiStatistics
Very detailed statistics for all almost all aspects of Wikipedia are available from http://www.wikipedia.org/wikistats/EN/Sitemap.htm.
Statistics for this page are taken from the Article count (alternate) table and from the Words table.
Note that the calculation number of words divided by number of articles is an oversimplification which leads to a very slightly exaggerated number of words per article. Excluding redirect pages, there are:
- 239,000 articles that have at least a single link and 200 readable characters (about 33 words or more).
- 261,000 articles that have at least a single link but which may be less than 200 characters (about 33 words or less).
Taking the difference of these two figures, there are at least
- 22,000 articles with a single link and less than 200 characters
and there are an uncounted number of articles which have no link, irrespective of their size. The current statistics (appear to) provide no guide as to the size of this last category of articles. The upshot is that the 79 million words in fact covers the 239,000 bona fide articles, and the rest of the articles - the 22,000 that are less than 200 characters and the uncounted number of articles without links. A guestimate is that all these will occupy 2 million words (e.g. 22k times 33 words = 726k words, and allow 1.24 M for all articles with no links), and so the mean article length in words should be 77 M words divided by 239 k articles = 322 words, instead of the 331 quoted above, representing a 2.78% overestimate.
Further, of the articles on the English Wikipedia, perhaps 36,000 are "data dumped" gazeteer entries about towns and cities in the USA. There is still discussion as to whether gazeteer entries should count towards the number of "real" encyclopedia articles; equally, their statistical significance is very much less now than in October 2002 when they were added; and very many have been colonised by Wikipedians who have transformed them to varying extents, including to an unimpeachably encyclopaedic status.
See also: