Artificial intelligence in Wikimedia projects: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: pmc updated in citation with #oabot.
Citation bot (talk | contribs)
Alter: template type, title. Add: doi, chapter-url, class, date, eprint, bibcode, authors 1-6. Removed or converted URL. Removed access-date with no URL. Upgrade ISBN10 to 13. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:AI software | #UCB_Category 24/37
Line 17:
In August 2018, a company called Primer reported attempting to use artificial intelligence to create Wikipedia articles about women as a way to address [[gender bias on Wikipedia]].<ref>{{Cite magazine |last1=Simonite |first1=Tom |title=Using Artificial Intelligence to Fix Wikipedia's Gender Problem |url=https://www.wired.com/story/using-artificial-intelligence-to-fix-wikipedias-gender-problem/ |magazine=Wired |date=3 August 2018}}</ref><ref>{{cite web |last1=Verger |first1=Rob |title=Artificial intelligence can now help write Wikipedia pages for overlooked scientists |url=https://www.popsci.com/artificial-intelligence-scientists-wikipedia |website=Popular Science |language=en |date=7 August 2018}}</ref>
 
[[File:DeepL machine translation of English Wikipedia example.png|thumb|Machine translation software such as [[DeepL]] is used by contributors<ref>{{cite journal |last1=Costa-jussà |first1=Marta R. |last2=Cross |first2=James |last3=Çelebi |first3=Onur |last4=Elbayad |first4=Maha |last5=Heafield |first5=Kenneth |last6=Heffernan |first6=Kevin |last7=Kalbassi |first7=Elahe |last8=Lam |first8=Janice |last9=Licht |first9=Daniel |last10=Maillard |first10=Jean |last11=Sun |first11=Anna |last12=Wang |first12=Skyler |last13=Wenzek |first13=Guillaume |last14=Youngblood |first14=Al |last15=Akula |first15=Bapi |last16=Barrault |first16=Loic |last17=Gonzalez |first17=Gabriel Mejia |last18=Hansanti |first18=Prangthip |last19=Hoffman |first19=John |last20=Jarrett |first20=Semarley |last21=Sadagopan |first21=Kaushik Ram |last22=Rowe |first22=Dirk |last23=Spruit |first23=Shannon |last24=Tran |first24=Chau |last25=Andrews |first25=Pierre |last26=Ayan |first26=Necip Fazil |last27=Bhosale |first27=Shruti |last28=Edunov |first28=Sergey |last29=Fan |first29=Angela |last30=Gao |first30=Cynthia |last31=Goswami |first31=Vedanuj |last32=Guzmán |first32=Francisco |last33=Koehn |first33=Philipp |last34=Mourachko |first34=Alexandre |last35=Ropers |first35=Christophe |last36=Saleem |first36=Safiyyah |last37=Schwenk |first37=Holger |last38=Wang |first38=Jeff |title=Scaling neural machine translation to 200 languages |journal=Nature |date=June 2024 |volume=630 |issue=8018 |pages=841–846 |doi=10.1038/s41586-024-07335-x |language=en |issn=1476-4687|pmc=11208141 |bibcode=2024Natur.630..841N }}</ref><ref name="nyt180724"/><ref name="considerations">{{cite webarXiv |title=Considerations for Multilingual Wikipedia Research |urleprint=https://arxiv.org/abs/2204.02483 |last1=Johnson |first1=Isaac |last2=Lescak |first2=Emily |date=2022 |class=cs.CY }}</ref><ref>{{cite book |last1=Mamadouh |first1=Virginie |title=Handbook of the Changing World Language Map |date=2020 |publisher=Springer International Publishing |isbn=978-3-030-02438-3 |pages=3773–3799 |chapter-url=https://link.springer.com/referenceworkentry/10.1007/978-3-030-02438-3_200 |language=en |chapter=Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity|doi=10.1007/978-3-030-02438-3_200 |quote=Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents}}</ref> More than 40% of Wikipedia's active editors
are in [[English Wikipedia]].<ref>{{cite webarXiv |title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables |urleprint=https://arxiv.org/abs/2307.03313 |last1=Khincha |first1=Siddharth |last2=Jain |first2=Chelsi |last3=Gupta |first3=Vivek |last4=Kataria |first4=Tushar |last5=Zhang |first5=Shuo |date=2023 |class=cs.CL }}</ref>]]
===Generative models===
[[File:Wikipedia - Artificial intelligence in Wikimedia projects (spoken by AI voice).mp3|thumb|Wikipedia articles can be read using AI voice technology]]
Line 28:
 
==Using Wikimedia projects for artificial intelligence==
[[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png|thumb|Datasets of Wikipedia are widely used for training AI models<ref>{{cite webarXiv |title=Will we run out of data? Limits of LLM scaling based on human-generated data |urleprint=https://arxiv.org/abs/2211.04325 |access-last1=Villalobos |first1=Pablo |last2=Ho |first2=Anson |last3=Sevilla |first3=Jaime |last4=Besiroglu |first4=Tamay |last5=Heim |first5=Lennart |last6=Hobbhahn |first6=Marius |date=292022 November|class=cs.LG 2024}}</ref>]]
Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, in the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news|url=https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/|title=Google's comment-ranking system will be a hit with the alt-right|work=Engadget|date=2017-09-01}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724"/><ref name="considerations"/>
 
Line 35:
A 2016 research project called "One Hundred Year Study on Artificial Intelligence" named Wikipedia as a key early project for understanding the interplay between artificial intelligence applications and human engagement.<ref>{{cite web |title=AI Research Trends - One Hundred Year Study on Artificial Intelligence (AI100) |url=https://ai100.stanford.edu/2016-report/section-i-what-artificial-intelligence/ai-research-trends |website=ai100.stanford.edu |language=en}}</ref>
 
There is a concern about the lack of attribution to Wikipedia articles in large-language models like ChatGPT.<ref name="nyt180724">{{cite news |title=Wikipedia’sWikipedia's Moment of Truth |url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html |access-date=29 November 2024 |work=New York Times}}</ref> While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.<ref name="nyt180724"/>
 
==See also==