User:Piotrus/Wikipedia interwiki and specialized knowledge test: Difference between revisions
Content deleted Content added
sp |
|||
(17 intermediate revisions by 4 users not shown) | |||
Line 3:
|}
{{update|reason=New data for the current year should be added|date=August 2023}}
All the time one can hear claims that Wikipedia has "enough articles" and it is unlikely to grow. And all the time those predictions are proven wrong. In summer 2006, there were about 2 millions articles in need of translation from non-English Wikipedias, and more then 50 million of specialized topics in need of creation (I justify those numbers below). In summer 2011, Wikipedia boasted 3.5 million articles, still covering less than 10% of what would be, roughly, a comprehensive coverage of world's notable subjects. Wikipedia is just in its infancy...▼
▲All the time one can hear claims that Wikipedia has "enough articles" and it is unlikely to grow. And all the time those predictions are proven wrong. In summer 2006, there were about 2 millions articles in need of translation from non-English Wikipedias, and more
==Introduction==
Line 12 ⟶ 14:
I checked pages of [[User:YurikBot]] and on [[Wikipedia:Interwikimedia link]], [[Wikipedia:Interlanguage links]] (shouldn't those two be merged?), and [[Wikipedia:Multilingual coordination]], but they don't seem to have the answer (or I can't find it :>)
Note: while the initial comparison (Polish Wikipedia, PSB) was done by me (<sub><span style="border:1px solid #228B22;padding:1px;">[[User:Piotrus|<
==Polish Wikipedia interwiki test==
So I decided to run a little test: take a [[random sample]] of 100 pages from [[Polish Wikipedia]] (4th largest Wikipedia with over 250,000 articles) and see how many have interwiki links to en wiki. The sample was taken by clicking the '[[Wikipedia:Random page|random page]]' button and noting down if article has an interwiki or not.
Results: out of 100 pages randomly selected on Polish Wikipedia, 72 had no interwiki links to en Wikipedia. (test as of 22 July 2006;
Notes:
Line 45 ⟶ 47:
# 10 Sept 2010: 76 had, 24 did not. Out of those which had, 54 had links to multiple wikis including the English one, 10 to English wiki only, 3 to multiple one but not English and 9 to one non English wikis. The following one had no interwiki: Grzegorz Cwajg, Waldemar Wegrzyn, Wyroby hutnicze, Nawuj, Bartosz Konopka, Lubelska Szkola Biznesu, Slask Wroclaw w europejskich pucharach sezon 2000/2001, Jezioro Dziewicze (dolnoslaskie), Grzegorz Kuzniewicz, Alina Rossman, Ksiestwo czerskie, Terapia logopedyczna, Marek Boral, Stary Ujków, Gmina Góra (powiat warszawski), Towarzystwo Milosników Historii, Kapitan zeglugi wielkiej, Witold Tokarski, Intentional grounding, Jan Wyszkowic, Thiago Gomes, Hemicordulia tau, Kamieniec Zabkowicki (stacja kolejowa), Mieczyslaw Poznanski, Gmina Lagiewniki (ujednoznacznienie), Juan Antonio Albizu). The following one had a single non-English interwiki: Powiat sulechowski - Esperanto, Maurycy Stefanowicz - Finnish, Les Ordres - Catalonian, Cyryl (Dmitriew) - Russian, Roztoka Odrzanska - Dannish),Brendan Doran - German, Renat Ibragimow - Russian, Ron Richards - German, and the following ones, multiple non-English: Hovs kommuna, Naczyniak limfatyczny torbielowaty, Lubusz (ujednoznacznienie). The sample included 4 disambigs; in four instances I added a missing interwiki between Polish and English wikis (those instances are counted as having an interwiki).
# 23 May 2011: 74 had, 26 did not. Out of the 74 which did, 56 had links to multiple wikis including the English one, 9 to English wiki only, 4 to multiple one but not English and 5 to one non English wikis. The following one had no interwiki: Kosciól sw. Antoniego Padewskiego w Gdansku, Swietlik nadobny, Bóstwa astralne, Ognisko (pismo), Rafal Szukiel, Galeria pod Sufitem, Ciri, Julian Kalowski, Kosc luskowa, Nowa figuracja, Fengxue Yanzhao, ORP Jastrzab (1963), Zbór Kosciola Ewangelicznych Chrzescijan w Lublinie, Albin Jasinski, Litoriusz, Adam Sobieraj, Renato Salvatore, Julija Sorokowska, SIERRA (program), Antoni Maciej Sierakowski, Ugoda polsko-ukrainska z 1935 roku, Kazimierz Dlugopolski, Yaro, Wskaznik protrombinowy, Artur Mkrtczjan, Osiedle Sloneczne (Wloclawek). The following one had a single non-English interwiki: de - Ernst Eckstein, Dialekt malopolski, Herman David, Johannes Theodor Suhr, Klasztor kartuzów we Frankfurcie nad Odra, lt - Giemzy II. The following ones, multiple non-English: NGC 5892, Towarzystwo im. Mychajla Kaczkowskiego, Sferosom, Karmnik rozeslany. The sample included 3 disambigs; in 2 instances I added a missing interwiki between Polish and English wikis (those instances are counted as having an interwiki).
# 25 Oct 2011: 74 had, 26 did not. Out of 74 that did, 54 had links to multiple wikis including the English one, 13 to English wiki only, 5 to a single non-English one and 2 to multiple non English wikis. The following had no interwiki: Herb Łobza, Mistrzostwa Europy Juniorów w Lekkoatletyce 1989, Nasza ulica, Błeszyński (herb szlachecki), Pałac w Brzyskach, Ryszard "Skiba" Skibiński 1951-1983, Osiedle Miłocin (Rzeszów), Gry liczbowe, Killer (MUD), Michaił Okuń, Edward Rybarz, Dorota Brodowska, Most Siechnice-Łany, Obniżenie Mieroszowskie, Pomnik Piotra Skrzyneckiego w Krakowie (ul. Skawińska), Liceum, Sióstr Niepokalanek w Szymanowie k. Warszawy, Grupa mostowa, Ignacy Zatorski, Tarcie ruchowe, 3 Pułk Zmechanizowany, Indywidualne mistrzostwa Anglii na żużlu, Medal Stanisława Kostaneckiego, Krawędź tkaniny, Turniej Czterech Narodów w Bangkoku 1999, Szymon Czapiewski, Al-Dżarkana. The following ones, multiple non-English: Droga N14 (Holandia), Katja Wächter. The following one had single link to non-English wikis: 1 German (Oberharz), 2 Lithuanian (Parafia św. Jana Teologa w Dąbrowie Białostockiej, Kaletnik Mały) and 2 Dutch (Bartąg (przystanek kolejowy), Bieruń Stary). The sample included 4 disambigs, in 3 instances I added a missing interwiki between Polish and English wikis (those instances are counted as having an interwiki).
# 28 March 2012: 66 had, 34 did not. Out of 66 that did, 47 had links to multiple wikis including the English one, 8 to English wiki only, 4 to a single non-English one and 7 to multiple non English wikis. The following had no interwiki: Królowe Wirtembergii, Nokia DCT3, Lista mistrzów Formuly 1, Urszula Garwolinska, Swielub, Jan Olszanski, Trójkat (polaczenie), Palac w Szalszy, Aleksander Goluchowski, Towarzystwo Milosników Miasta Bydgoszczy, SN 2001by, Sznajderman, Miloslaw (ujednoznacznienie), 10 Pulk Artylerii Lekkiej, Slonce Arizon, Nowy Józefów (osiedle w Lodzi), Sizzo, Reprezentacja Kirgistanu na Mistrzostwach Swiata w Narciarstwie Klasycznym 2011, Wojciech Zalinski, Skaly melanokratyczne, Klasztor Karmelitów Bosych w Zawoi, Marek Kryda, Zoogoneticus, Linia kolejowa nr 92, Andrzej Malczewski, Skarzynski, Stacja Monitoringu Srodowiska Przyrodniczego UAM w Bialej Górze, Centralna Skladnica Marynarki Wojennej, Ihor Czeredniczenko, Moczylki Waskotorowe, Spóldzielcza Grupa Bankowa, Mieczyslaw Piotrowski (pisarz), Wyleganie, Naftali Lau-Lavie, Sari Ska Band. The following ones, multiple non-English: Club Voleibol Cuesta Piedra, NGC 495, Sankt-Georg-Schanze, Frankendorf, Paolo De Nicolò, Wyszeslaw Wlodzimierzowic, Rosyjska Akademia Sluzby Panstwowej. The following one had single link to non-English wikis: Slavonín (Czech), Åge Sten Nilsen (Norwegian), KAB-500 (Russian), Ukrainskie Regionalne Muzeum "Strywihor" w Przemyslu (Ukrainian). The sample included 8 disambigs.
# 26 August 2012: 77 had, 23 did not. Out of 77 that did, 68 had links to multiple wikis including the English one, 4 to English wiki only, 3 to a single non-English one and 2 to multiple non English wikis. The following had no interwiki: Roztoka (Góry Leluchowskie), Tribute to Rejestracja, Robert Sikorski, Kanada na Igrzyskach Imperium Brytyjskiego 1934, Metropolia Kansas City, Kreznica Okragla (kolonia w gminie Belzyce), Kreznica Okragla (kolonia w gminie Belzyce), Nowe Siolo (ujednoznacznienie), Kosciól sw. Jana Chrzciciela w Leszczawie Dolnej, Sluz gestagenny, Hotel Polonia we Wroclawiu, Parahomoceras, Parafia sw. Karola Boromeusza w Poznaniu, Urszula Zybura, Czeslaw Robakowski, Dynamo Tarnopol, Województwo slasko-dabrowskie, 2 Front, EXPAL, Szubieniczna (Kotlina Klodzka), Kaplica Swietego Krzyza w Lukowicy, Aleksander Dobrzanski (biskup), Herb Labiszyna, SN 1989R. The following ones, multiple non-English: NGC 2028, Blanc guenar. The following one had single link to non-English wikis: Indaeschna grubaueri (Dutch), Oleksij Hatin (French), Barak Baba (Turkish). The sample included 3 disambigs.
# 13 February 2013: 83 had, 13 did not. Out of 83 that did, 70 had links to multiple wikis including the English one, 7 to English wiki only, 5 to a single non-English one and 1 to multiple non English wikis. The following had no interwiki: Vápenná jaskyna, Kiczora (839), Rozwiniecie Herbranda, Bohdan Kurowski, Kynoforia, Najasnica, Siódme wtajemniczenie, Rotunda Najswietszej Marii Panny na Wawelu, Wulkan eksplozywny, Klimat podrównikowy, Cud Matki Boskiej Snieznej, Berlinka (statek), Bartlomiej Kwiatkowski, Wiktoria Quintana Argos, Dekanat Mogilany, I Liceum Ogólnoksztalcace im. Juliusza Slowackiego w Czestochowie. The following ones, multiple non-English: NGC 6147. The following one had single link to non-English wikis: Bilgoraj LHS (Dutch), Izaak Brudny (Russian), Antoni (Zawgorodny) (Russian), Olszyna Lubanska (Dutch), Østjyske Motorvej (Dutch). The sample included 5 disambigs.
# 15 May 2014: 78 had, 22 did not. Out of 78 that did, 60 had links to multiple wikis inc. the English one, 10 to English wiki only, 5 to a single non-English one and 3 to multiple non-English ones. The following had no interwiki: Pomnik przyrody w gminie Czorsztyn, Inbentos, Polska Bibliografia Narodowa, Józef Jagielski (komunista), Skala CEAP, Instytut Wiedzy i Innowacji, Jan Wislocki, Medalisci mistrzostw Polski seniorów w pchnieciu kula, Buczyna (powiat olkuski), Zwiazek Hodowców Psów Rasowych, Kajetan Proskura Suszczanski, Krzysztof Michalik (informatyk), Teatr Dramatyczny im. Jana Kochanowskiego w Opolu, Gromada Pogrzebien, Rózany Potok, Palac biskupi w Ciazeniu, Petecki, Memorial Józefa Dominika, Parafia sw. Marcina BW w Siemkowicach, Ruch Przyszlosci, Bractwo sw. Lukasza (Polska), Tylkowy Zleb, Gmina Kowala (województwo krakowskie). The following ones, multiple non-English: NGC 691 (19), Droga krajowa nr 14 (Polska) (3), Kulewcza (obwód Szumen) (5), Aleš Hanák (2). The following one had single link to non-English wikis: Radzic (Russian), Maccabi Nazaret YMCA (Israeli), Bitwa pod Labiszynem (Russian), Halina Dorda (Czech), Alfred Piper (German). The sample included 2 disambigs.
== Specialized knowledge test ==
Next, I decided to run a comparison of 'how many articles from a random encyclopedic publication' are missing on Wikipedia. The publication I selected, [[Polski Słownik Biograficzny]] (encyclopedia of famous Poles), was not completely random, but as far as I know there is no project dedicated to creating relevant stubs on en-wiki, and as one of my past projects there is a nice index at [[User:Piotrus/List of Poles]]. Note also that PSB is not a general knowledge encyclopedia but a specialized knowledge encyclopedia.
Results: as of 22 July 2006 out of selected 1000 entries of [[User:Piotrus/List of Poles/Kisielinski-Korzelinski]], about 30 entries have blue links (I ignored entries in need of disambigation, like 10 entries for [[Konrad]]).
Notes:
Line 64 ⟶ 71:
=== Updates ===
Preeliminary analysis suggests coverage improvement of ~1% per year, with the estimate completion around turn of the century, assuming a linear growth model...
# 8 August 2007. I counted 34 blue links in 'Kisielinski-Korzelinski'. I counted two more for better stats: 'Olbrycht-Pawleta' - 37; 'Ebenberger-Gembicki' - 28 - so the ~3% still holds.
# 16 May 2008. 'Jesionowski-Kisielewski': 47. 'Skowron-Spiczakow': 23. 'Biergel-Bzowski': 36. Some interesting outliers, but it is safe to say ~3% still holds.
# 25 December 2008. 'Biergel-Bzowski': 36, 'Hoser-Jerzykowski': 46, 'Majnert-Michiels': 44. ~4%?
# 23 March 2009. 'Danielski-Dzwonkowski': 52. 'Lichtenstein-Majkowski': 67. 'Rutowicz-Schreiber'. 58 ~5%?
# 16 June 2009. 'Skowron-Spiczakow': 28. 'Przyalgowski-Retke': 65. 'Grodecki-Hoscki': 48. ~5%?
# 8 Dec 2010. 'Kisielinski-Korzelinski' 58. 'Olbrycht-Pawleta'
# 23 May 2011. 'Gemma-Groddeck' 58; 'Rutowicz-Schreiber' - 70; 'Krzesinski-Lichtarowicz' - 61. Keeping at ~6%
# 25 Oct 2011. 'Abakanowicz-Bienkowski' 57, 'Korzeniewski-Krzesimowski' 67, 'Skowron-Spiczakow' - 37. No change.
# 29 March 2012. I counted 63 blue links in 'Kisielinski-Korzelinski'. 'Olbrycht-Pawleta' - 70; 'Ebenberger-Gembicki' - 50. No change. This time I decided to repeat the first sample.
# 26 August 2012. 'Jesionowski-Kisielewski': 72. 'Skowron-Spiczakow': 38. 'Biergel-Bzowski': 59. ~6% still seems to be the rough estimate. Repeated second sample. Individual samples seems to suggest rough doubling within their populations.
# 13 February 2013 'Biergel-Bzowski': 58, 'Hoser-Jerzykowski': 66, 'Majnert-Michiels': 72. ~6.5%?
# 15 May 2014. 'Rettel-Rutkowski': 46, 'Gemma-Groddeck': 59, 'Lichtenstein-Majkowski': 89. ~6.5%?
Updated conclusion (as of February '11): It appears that Wikipedia is growing faster in some other areas than in Polish biographies. Wikipedia coverage of Polish biographies have doubled between June '06 and December '10, but its total number of articles has grown almost threefold in that period (well, around 2.6 times). If we were to take June '09 or Dec '10 numbers and try to estimate the size of complete Wikipedia, we would get the number of ~60 million instead of 40, as the June '06 data would suggest. Of course, as the growth in Polish biographies have not kept pace with the growth of Wikipedia, it is obvious that it is hardly a perfect estimator. Assuming it is some kind of an estimator, we might as well take an average of those two results and call the ultimate, comprehensive size 50 million.
Line 79 ⟶ 91:
* [[Wikipedia:Modelling Wikipedia's growth]]
* [[Wikipedia:WikiProject Missing encyclopedic articles]]
* [[User:Emijrp/All human knowledge]]
* [[meta:List of Wikipedias]]
* [[User:Piotrus/Morsels of wikiwisdom]] - my essays on failings in wikipedia social structure
|