User:Piotrus/Wikipedia interwiki and specialized knowledge test: Difference between revisions

Content deleted Content added
Specialized knowledge test: this assumption is totally flawed
rv anon, discussion is welcome on talk. Some corrections and more information.
Line 3:
|}
 
All the time one can hear claims that Wikipedia has "enough articles" and it is unlikely to grow. And all the time those predictions are proven wrong. In summer 2006, there were about 2 millions articles in need of translation from non-English Wikipedias, and more then 4050 million of specialized topics in need of creation (I justify those numbers below). Wikipedia is just in its infancy...
 
==Introduction==
One of the interesting questions about Wikipedia is '"how much more information there is for Wikipedia to assimilate'"?
 
Part of that answer, if we look at [[English Wikipedia]], is the number of articles from non-English wikipediasWikipedias that need to be translated. I was somewhat surprised that we have no such statistics - at least I was unable to find information on how many articles on a given Wiki (for example, Polish wiki) have interlinks to a specific (English) Wiki?
 
I checked pages of [[User:YurikBot]] and on [[Wikipedia:Interwikimedia link]], [[Wikipedia:Interlanguage links]] (shouldn't those two be merged?), and [[Wikipedia:Multilingual coordination]], but they don't seem to have the answer (or I can't find it :>)
Line 17:
So I decided to run a little test: take a [[random sample]] of 100 pages from [[Polish Wikipedia]] (4th largest Wikipedia with over 250,000 articles) and see how many have interwiki links to en wiki. The sample was taken by clicking the '[[Wikipedia:Random page|random page]]' button and noting down if article has an interwiki or not.
 
Results: out of 100 pages randomly selected on Polish Wikipedia, 72 had no interwiki links to en Wikipedia. (test as of 22 July 2006; Wikipiedia at that time had about 1,350,000 articles.)
 
Notes:
#The results may be slightly undercountedunder counted, as it is possible some pages have an unlinked equivalent on en-wiki. If I were to check, I expect that around 10 pages should have equivalents on en wiki.
#I did count disambig pages
#Interestingly quite a few of pages that have no interwiki to English wiki have interwiki to wikis in another language. I did not keep track of that, just noted the general trend.
Line 28:
 
=== To do ===
* see if the same number holds true for other Wikipedias. Analysis of German - second largest, and French - third largest, would be quite useful. The other 7 to get 'top ten' would further increase statistical significance of these analysis accrossacross the Wikipedia's projects.
* get a bot to run those tests every month or so
* record data on interwikis to non-english languages if english is not present
Line 48:
Next, I decided to run a comparison of 'how many articles from a random encyclopedic publication' are missing on Wikipedia. The publication I selected, [[Polski Słownik Biograficzny]] (encyclopedia of famous Poles), was not completely random, but as far as I know there is no project dedicated to creating relevant stubs on en-wiki, and as one of my past projects there is a nice index at [[User:Piotrus/List of Poles]]. Note also that PSB is not a general knowledge encyclopedia but a specialized knowledge encyclopedia.
 
Results: as of 22 July 2006 out of selected 1000 entries of [[User:Piotrus/List of Poles/Kisielinski-Korzelinski]], about 30 entries have blue links (I ignored entriedentries in need of disambigation, like 10 entries for [[Konrad]]). (testWikipiedia asat ofthat 22time Julyhad 2006)about 1,350,000 articles.
 
Notes:
#As the bot we used to generate this index was not perfect, and it doesn't help if there is an entry with diactricsdiacritics but no redirect without them, this means the blue links may be somewhat undercountedunder counted, but I would be very suprisedsurprised if by 50%.
#Due to the nature of PSB (print publication, not easily updated) and my own observation (how many Poles who have articles on Wikipedia are not listed in PSB), it should be noted that PSB does not represent perfect coverage in its area. But for now I assume that the number of articles about Poles Wiki has that are not on PSB, and the number of missing articles from Wikipedia that are covered on PSB evens out.
 
Conclusion: assuming PSB represents the average coverage of specialized knowledge on (English) Wikipedia, Wikipedia has covers currently about 3% of such knowledge. If 3% = 1,350,000 articles, than 100% would equal, roughly, 40,000,000 (40 million) articles. Therefore Wikipedia will be approaching somewhat comprehensive coverage of specialized knowledge when we have about 40,000,000 articles.{{dubious|garbage}} This is a very rough estimate, but it is my reply to some people who said there is not enough encyclopedic knowledge [[Wikipedia:Two-million pool|to merit 2,000,000 articles]], as well as to the very optimistic estimates of [[Wikipedia:WikiProject Missing encyclopedic articles]] (Biographies - 92.6% done ?? who are they kidding? :D )
 
=== To do ===
Line 63:
=== Updates ===
Preeliminary analysis suggests coverage improvement of ~1% per year, with the estimate completion around 2105...
# 8 August 2007. I counted 34 blue links in 'Kisielinski-Korzelinski'. I counted two more for better stats: 'Olbrycht-Pawleta' - 37; 'Ebenberger-Gembicki' - 28 - so the ~3% still holds. Wikipiedia at that time had about 1,800,000 articles.
# 16 May 2008. 'Jesionowski-Kisielewski': 47. 'Skowron-Spiczakow': 23. 'Biergel-Bzowski': 36. Some interesting outliers, but it is safe to say ~3% still holds. Wikipiedia at that time had about 2,250,000 articles. Wikipiedia at that time had about 2,300,000 articles.
# 25 December 2008. 'Biergel-Bzowski': 36, 'Hoser-Jerzykowski': 46, 'Majnert-Michiels': 44. ~4%? Wikipiedia at that time had about 2,600,000 articles.
# 23 March 2009. 'Danielski-Dzwonkowski': 52. 'Lichtenstein-Majkowski': 67. 'Rutowicz-Schreiber'. 58 ~5%? Wikipiedia at that time had about 2,750,000 articles.
# 16 June 2009. 'Skowron-Spiczakow': 28. 'Przyalgowski-Retke': 65. 'Grodecki-Hoscki': 48. ~5%? Wikipiedia at that time had about 2,950,000 articles.
# 8 Dec 2010. 'Kisielinski-Korzelinski' 58. 'Olbrycht-Pawleta' - 66; 'Ebenberger-Gembicki' - 60. ~6%, and double the coverage of 2007. Wikipiedia at that time had about 3,500,000 articles.
 
It appears that Wikipedia is growing faster in some other areas than in Polish biographies. Wikipedia coverage of Polish biographies have doubled between June '06 and December '10, but its total number of articles has grown almost threefold in that period (well, around 2.6 times). If we were to take June '09 or Dec '10 numbers and try to estimate the size of complete Wikipedia, we would get the number of ~60 million instead of 40, as the June '06 data would suggest. Of course, as the growth in Polish biographies have not kept pace with the growth of Wikipedia, it is obvious that it is hardly a perfect estimator. Assuming it is some kind of an estimator, we might as well take an average of those two results and call the ultimate, comprehensive size 50 million.
 
== See also ==