Artificial intelligence in Wikimedia projects: Difference between revisions

Content deleted Content added
Pxldnky77 (talk | contribs)
mNo edit summary
Tags: Mobile edit Mobile app edit iOS app edit App section source
Pxldnky77 (talk | contribs)
Moved sections around because this article is mostly focused on Wikipedia.
Line 2:
[[Artificial intelligence]] is used in [[Wikipedia]] and other [[Wikimedia projects]] for the purpose of developing those projects.<ref>{{cite web |last1=Marr |first1=Bernard |title=The Amazing Ways How Wikipedia Uses Artificial Intelligence |url=https://www.forbes.com/sites/bernardmarr/2018/08/17/the-amazing-ways-how-wikipedia-uses-artificial-intelligence/#7cbdda802b9d |website=Forbes |language=en |date=17 August 2018}}</ref><ref name="NYT-20230718">{{cite news |last=Gertner |first=Jon |title=Wikipedia's Moment of Truth - Can the online encyclopedia help teach A.I. chatbots to get their facts right — without destroying itself in the process? + comment |url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html |date=18 July 2023 |work=[[The New York Times]] |url-status=bot: unknown |archiveurl=https://web.archive.org/web/20230718233916/https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html#permid=126389255 |archivedate=18 July 2023 |accessdate=19 July 2023 }}</ref> Human and [[Internet bot|bot]] interaction in Wikimedia projects is routine and iterative.<ref>{{cite arXiv |last1=Piscopo |first1=Alessandro |title=Wikidata: A New Paradigm of Human-Bot Collaboration? |date=1 October 2018 |eprint=1810.00931|class=cs.HC }}</ref>
 
== Using artificial intelligence for WikimediaWikipedia projects==
Various projects seek to improve Wikipedia and Wikimedia projects by using artificial intelligence tools.
 
=== ORES ===
The Objective Revision Evaluation Service (ORES) project is an artificial intelligence service for grading the quality of Wikipedia edits.<ref>{{cite web |last1=Simonite |first1=Tom |date=1 December 2015 |title=Software That Can Spot Rookie Mistakes Could Make Wikipedia More Welcoming |url=https://www.technologyreview.com/s/544036/artificial-intelligence-aims-to-make-wikipedia-friendlier-and-better/ |website=MIT Technology Review |language=en |date=1 December 2015}}</ref><ref>{{Cite magazine |last1=Metz |first1=Cade |date=1 December 2015 |title=Wikipedia Deploys AI to Expand Its Ranks of Human Editors |url=https://www.wired.com/2015/12/wikipedia-is-using-ai-to-expand-the-ranks-of-human-editors/ |magazine=Wired |date=1 December 2015|archive-url=https://web.archive.org/web/20240402000516/https://www.wired.com/2015/12/wikipedia-is-using-ai-to-expand-the-ranks-of-human-editors/ |archive-date=2 Apr 2024 |magazine=Wired}}</ref> The Wikimedia Foundation presented the ORES project in November 2015.<ref>{{cite web |last1=Halfaker |first1=Aaron |last2=Taraborelli |first2=Dario |date=30 November 2015 |title=Artificial intelligence service "ORES" gives Wikipedians X-ray specs to see through bad edits |url=https://wikimediafoundation.org/2015/11/30/artificial-intelligence-x-ray-specs/ |website=Wikimedia Foundation |date=30 November 2015}}</ref>
 
=== Wiki bots ===
{{See also|Wikipedia bots}}{{Excerpt|Vandalism on Wikipedia|ClueBot NG}}
 
===Detox===
Detox was a project by Google, in collaboration with the Wikimedia Foundation, to research methods that could be used to address users posting unkind comments in Wikimedia community discussions.<ref>{{Cite book |title=Research:Detox - Meta |url=https://meta.wikimedia.org/wiki/Research:Detox |language=en}}</ref> Among other parts of the Detox project, the Wikimedia Foundation and [[Jigsaw (company)|Jigsaw]] collaborated to use artificial intelligence for basic research and to develop technical solutions{{examples needed|date=April 2023}} to address the problem. In October 2016 those organizations published "Ex Machina: Personal Attacks Seen at Scale" describing their findings.<ref>{{Cite book |pages=1391–1399 |doi=10.1145/3038912.3052591 |arxiv=1610.08914|year=2017 |last1=Wulczyn |first1=Ellery |last2=Thain |first2=Nithum |last3=Dixon |first3=Lucas |title=Proceedings of the 26th International Conference on World Wide Web |chapter=Ex Machina: Personal Attacks Seen at Scale |isbn=9781450349130 |s2cid=6060248 }}</ref><ref>{{cite web |author1=Jigsaw |title=Algorithms And Insults: Scaling Up Our Understanding Of Harassment On Wikipedia |url=https://medium.com/jigsaw/algorithms-and-insults-scaling-up-our-understanding-of-harassment-on-wikipedia-6cc417b9f7ff |website=Medium |date=7 February 2017}}</ref> Various popular media outlets reported on the publication of this paper and described the social context of the research.<ref>{{cite news |last1=Wakabayashi |first1=Daisuke |title=Google Cousin Develops Technology to Flag Toxic Online Comments |url=https://www.nytimes.com/2017/02/23/technology/google-jigsaw-monitor-toxic-online-comments.html |journal=The New York Times |language=en |date=23 February 2017}}</ref><ref>{{cite web |last1=Smellie |first1=Sarah |title=Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment |url=https://www.vice.com/en/article/wikipedia-jigsaw-google-artificial-intelligence/ |website=Motherboard |publisher=[[Vice Media]] |language=en-us |date=17 February 2017}}</ref><ref>{{cite web |last1=Gershgorn |first1=Dave |title=Alphabet's hate-fighting AI doesn't understand hate yet |url=https://qz.com/918640/alphabets-hate-fighting-ai-doesnt-understand-hate-yet/ |website=Quartz |date=27 February 2017}}</ref>
 
===Bias reduction===
In August 2018, a company called Primer reported attempting to use artificial intelligence to create Wikipedia articles about women as a way to address [[gender bias on Wikipedia]].<ref>{{Cite magazine |last1=Simonite |first1=Tom |date=3 August 2018 |title=Using Artificial Intelligence to Fix Wikipedia's Gender Problem |url=https://www.wired.com/story/using-artificial-intelligence-to-fix-wikipedias-gender-problem/ |magazine=Wired |date=3 August 2018}}</ref><ref>{{cite web |last1=Verger |first1=Rob |date=7 August 2018 |title=Artificial intelligence can now help write Wikipedia pages for overlooked scientists |url=https://www.popsci.com/artificial-intelligence-scientists-wikipedia |website=Popular Science |language=en |date=7 August 2018}}</ref>
 
[[File:DeepL machine translation of English Wikipedia example.png|thumb|Machine translation software such as [[DeepL]] is used by contributors.<ref>{{cite journal |last1=Costa-jussà |first1=Marta R. |last2=Cross |first2=James |last3=Çelebi |first3=Onur |last4=Elbayad |first4=Maha |last5=Heafield |first5=Kenneth |last6=Heffernan |first6=Kevin |last7=Kalbassi |first7=Elahe |last8=Lam |first8=Janice |last9=Licht |first9=Daniel |last10=Maillard |first10=Jean |last11=Sun |first11=Anna |last12=Wang |first12=Skyler |last13=Wenzek |first13=Guillaume |last14=Youngblood |first14=Al |last15=Akula |first15=Bapi |last16=Barrault |first16=Loic |last17=Gonzalez |first17=Gabriel Mejia |last18=Hansanti |first18=Prangthip |last19=Hoffman |first19=John |last20=Jarrett |first20=Semarley |last21=Sadagopan |first21=Kaushik Ram |last22=Rowe |first22=Dirk |last23=Spruit |first23=Shannon |last24=Tran |first24=Chau |last25=Andrews |first25=Pierre |last26=Ayan |first26=Necip Fazil |last27=Bhosale |first27=Shruti |last28=Edunov |first28=Sergey |last29=Fan |first29=Angela |last30=Gao |first30=Cynthia |last31=Goswami |first31=Vedanuj |last32=Guzmán |first32=Francisco |last33=Koehn |first33=Philipp |last34=Mourachko |first34=Alexandre |last35=Ropers |first35=Christophe |last36=Saleem |first36=Safiyyah |last37=Schwenk |first37=Holger |last38=Wang |first38=Jeff |title=Scaling neural machine translation to 200 languages |journal=Nature |date=June 2024 |volume=630 |issue=8018 |pages=841–846 |doi=10.1038/s41586-024-07335-x |pmid=38839963 |language=en |issn=1476-4687|pmc=11208141 |bibcode=2024Natur.630..841N }}</ref><ref name="nyt180724"/><ref name="considerations">{{cite arXiv |title=Considerations for Multilingual Wikipedia Research |eprint=2204.02483 |last1=Johnson |first1=Isaac |last2=Lescak |first2=Emily |date=2022 |class=cs.CY }}</ref><ref>{{cite book |last1=Mamadouh |first1=Virginie |title=Handbook of the Changing World Language Map |date=2020 |publisher=Springer International Publishing |isbn=978-3-030-02438-3 |pages=3773–3799 |chapter-url=https://link.springer.com/referenceworkentry/10.1007/978-3-030-02438-3_200 |language=en |chapter=Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity|doi=10.1007/978-3-030-02438-3_200 |quote=Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents}}</ref> More than 40% of Wikipedia's active editors
are in [[English Wikipedia]].<ref>{{cite arXiv |title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables |eprint=2307.03313 |last1=Khincha |first1=Siddharth |last2=Jain |first2=Chelsi |last3=Gupta |first3=Vivek |last4=Kataria |first4=Tushar |last5=Zhang |first5=Shuo |date=2023 |class=cs.CL }}</ref>]]
===Generative models===
[[File:Wikipedia - Artificial intelligence in Wikimedia projects (spoken by AI voice).mp3|thumb|Wikipedia articles can be read using AI voice technology.]]
====Text====
In 2022, the public release of [[ChatGPT]] inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such [[large language model]]s are suitable for such purposes in light of their tendency to [[Hallucination (artificial intelligence)|generate plausible-sounding misinformation]], including fake references; to generate prose that is not encyclopedic in tone; and to [[Algorithmic bias|reproduce biases]].<ref>{{Cite web |last=Harrison |first=Stephen |date=2023-01-12 |title=Should ChatGPT Be Used to Write Wikipedia Articles? |url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html |access-date=2023-01-13 |website=Slate Magazine |language=en}}</ref><ref name ="vice">{{cite news |last1=Woodcock |first1=Claire |date=2 May 2023 |title=AI Is Tearing Wikipedia Apart |url=https://www.vice.com/en/article/ai-is-tearing-wikipedia-apart/ |work=Vice |language=en}}</ref> Since 2023, work has been done to [[Wikipedia:Artificial intelligence#Discussion timeline|draft Wikipedia policy on ChatGPT]] and similar [[large language model]]s (LLMs), e.g. at times recommending that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as noting the potential for [[libel]] or [[copyright infringement]].<ref name ="vice">{{cite news |last1=Woodcock |first1=Claire |title=AI Is Tearing Wikipedia Apart |url=https://www.vice.com/en/article/ai-is-tearing-wikipedia-apart/ |work=Vice |date=2 May 2023 |language=en}}</ref> Some relevant policies are linked at [[Wikipedia:WikiProject AI Cleanup/Policies|WikiProject AI Cleanup/Policies]].
 
====Other media====
A [[WikiProject]] exists for finding and removing [[AI generated content on Wikipedia|AI-generated text]] and images, called {{srlink|Wikipedia:WikiProject AI Cleanup|WikiProject AI Cleanup}}.<ref>{{Cite news |last=Maiberg |first=Emanuel |date=October 9, 2024 |title=The Editors Protecting Wikipedia from AI Hoaxes |url=https://www.404media.co/the-editors-protecting-wikipedia-from-ai-hoaxes/ |access-date=October 9, 2024 |work=[[404 Media]]}}</ref>
 
==== Simple Article Summaries= ===
In 2025, Wikimedia started testing a "Simple Article Summaries" feature which would provide AI-generated summaries of Wikipedia articles, similar to [[Google Search]]'s [[AI Overviews]]. The decision was met with immediate and harsh criticism from Wikipedia editors, who called the feature a "ghastly idea" and a "PR hype stunt." They criticized a perceived loss of trust in the site due to AI's tendency to [[Hallucination (artificial intelligence)|hallucinate]] and questioned the necessity of the feature.<ref>https://arstechnica.com/ai/2025/06/yuck-wikipedia-pauses-ai-summaries-after-editor-revolt/</ref> The negative criticism led Wikimedia to halt the rollout of Simple Article Summaries while hinting that they are still interested in how generative AI could be integrated into Wikipedia.<ref>https://techcrunch.com/2025/06/11/wikipedia-pauses-ai-generated-summaries-pilot-after-editors-protest/</ref>
==Using artificial intelligence for other Wikimedia projects==
 
=== Detox ===
==Using Wikimedia projects for artificial intelligence==
Detox was a project by Google, in collaboration with the Wikimedia Foundation, to research methods that could be used to address users posting unkind comments in Wikimedia community discussions.<ref>{{Cite book |title=Research:Detox - Meta |url=https://meta.wikimedia.org/wiki/Research:Detox |language=en}}</ref> Among other parts of the Detox project, the Wikimedia Foundation and [[Jigsaw (company)|Jigsaw]] collaborated to use artificial intelligence for basic research and to develop technical solutions{{examples needed|date=April 2023}} to address the problem. In October 2016 those organizations published "Ex Machina: Personal Attacks Seen at Scale" describing their findings.<ref>{{Cite book |pages=1391–1399 |doi=10.1145/3038912.3052591 |arxiv=1610.08914|year=2017 |last1=Wulczyn |first1=Ellery |last2=Thain |first2=Nithum |last3=Dixon |first3=Lucas |title=Proceedings of the 26th International Conference on World Wide Web |chapter=Ex Machina: Personal Attacks Seen at Scale |isbn=9781450349130 |s2cid=6060248 }}</ref><ref>{{cite web |author1=Jigsaw |title=Algorithms And Insults: Scaling Up Our Understanding Of Harassment On Wikipedia |url=https://medium.com/jigsaw/algorithms-and-insults-scaling-up-our-understanding-of-harassment-on-wikipedia-6cc417b9f7ff |website=Medium |date=7 February 2017}}</ref> Various popular media outlets reported on the publication of this paper and described the social context of the research.<ref>{{cite news |last1=Wakabayashi |first1=Daisuke |title=Google Cousin Develops Technology to Flag Toxic Online Comments |url=https://www.nytimes.com/2017/02/23/technology/google-jigsaw-monitor-toxic-online-comments.html |journal=The New York Times |language=en |date=23 February 2017}}</ref><ref>{{cite web |last1=Smellie |first1=Sarah |title=Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment |url=https://www.vice.com/en/article/wikipedia-jigsaw-google-artificial-intelligence/ |website=Motherboard |publisher=[[Vice Media]] |language=en-us |date=17 February 2017}}</ref><ref>{{cite web |last1=Gershgorn |first1=Dave |title=Alphabet's hate-fighting AI doesn't understand hate yet |url=https://qz.com/918640/alphabets-hate-fighting-ai-doesnt-understand-hate-yet/ |website=Quartz |date=27 February 2017}}</ref>
[[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png|thumb|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv |title=Will we run out of data? Limits of LLM scaling based on human-generated data |eprint=2211.04325 |last1=Villalobos |first1=Pablo |last2=Ho |first2=Anson |last3=Sevilla |first3=Jaime |last4=Besiroglu |first4=Tamay |last5=Heim |first5=Lennart |last6=Hobbhahn |first6=Marius |date=2022 |class=cs.LG }}</ref>]]
Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, in the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news|url=https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/|title=Google's comment-ranking system will be a hit with the alt-right|work=Engadget|date=2017-09-01}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724"/><ref name="considerations"/>
 
[[File:DeepL machine translation of English Wikipedia example.png|thumb|Machine translation software such as [[DeepL]] is used by contributors.<ref>{{cite journal |last1=Costa-jussà |first1=Marta R. |last2=Cross |first2=James |last3=Çelebi |first3=Onur |last4=Elbayad |first4=Maha |last5=Heafield |first5=Kenneth |last6=Heffernan |first6=Kevin |last7=Kalbassi |first7=Elahe |last8=Lam |first8=Janice |last9=Licht |first9=Daniel |last10=Maillard |first10=Jean |last11=Sun |first11=Anna |last12=Wang |first12=Skyler |last13=Wenzek |first13=Guillaume |last14=Youngblood |first14=Al |last15=Akula |first15=Bapi |last16=Barrault |first16=Loic |last17=Gonzalez |first17=Gabriel Mejia |last18=Hansanti |first18=Prangthip |last19=Hoffman |first19=John |last20=Jarrett |first20=Semarley |last21=Sadagopan |first21=Kaushik Ram |last22=Rowe |first22=Dirk |last23=Spruit |first23=Shannon |last24=Tran |first24=Chau |last25=Andrews |first25=Pierre |last26=Ayan |first26=Necip Fazil |last27=Bhosale |first27=Shruti |last28=Edunov |first28=Sergey |last29=Fan |first29=Angela |last30=Gao |first30=Cynthia |last31=Goswami |first31=Vedanuj |last32=Guzmán |first32=Francisco |last33=Koehn |first33=Philipp |last34=Mourachko |first34=Alexandre |last35=Ropers |first35=Christophe |last36=Saleem |first36=Safiyyah |last37=Schwenk |first37=Holger |last38=Wang |first38=Jeff |title=Scaling neural machine translation to 200 languages |journal=Nature |date=June 2024 |volume=630 |issue=8018 |pages=841–846 |doi=10.1038/s41586-024-07335-x |pmid=38839963 |language=en |issn=1476-4687|pmc=11208141 |bibcode=2024Natur.630..841N }}</ref><ref name="nyt180724">{{cite news |date=18 July 2023 |title=Wikipedia's Moment of Truth |url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html |access-date=29 November 2024 |work=New York Times}}</ref><ref name="considerations">{{cite arXiv |title=Considerations for Multilingual Wikipedia Research |eprint=2204.02483 |last1=Johnson |first1=Isaac |last2=Lescak |first2=Emily |date=2022 |class=cs.CY }}</ref><ref>{{cite book |last1=Mamadouh |first1=Virginie |title=Handbook of the Changing World Language Map |date=2020 |publisher=Springer International Publishing |isbn=978-3-030-02438-3 |pages=3773–3799 |chapter-url=https://link.springer.com/referenceworkentry/10.1007/978-3-030-02438-3_200 |language=en |chapter=Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity|doi=10.1007/978-3-030-02438-3_200 |quote=Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents}}</ref> More than 40% of Wikipedia's active editors
A 2012 paper reported that more than 1,000 academic articles, including those using artificial intelligence, examine Wikipedia, reuse information from Wikipedia, use technical extensions linked to Wikipedia, or research communication about Wikipedia.<ref>{{cite journal |last1=Nielsen |first1=Finn Årup |title=Wikipedia Research and Tools: Review and Comments |journal=SSRN Working Paper Series |date=2012 |doi=10.2139/ssrn.2129874 |language=en |issn=1556-5068|doi-access=free }}</ref> A 2017 paper described Wikipedia as the [[mother lode]] for human-generated text available for machine learning.<ref>{{cite journal |last1=Mehdi |first1=Mohamad |last2=Okoli |first2=Chitu |last3=Mesgari |first3=Mostafa |last4=Nielsen |first4=Finn Årup |last5=Lanamäki |first5=Arto |title=Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus |journal=Information Processing & Management |volume=53 |issue=2 |pages=505–529 |doi=10.1016/j.ipm.2016.07.003 |date=March 2017|s2cid=217265814 |url=http://urn.fi/urn:nbn:fi-fe202003057304 }}</ref>
are in [[English Wikipedia]].<ref>{{cite arXiv |title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables |eprint=2307.03313 |last1=Khincha |first1=Siddharth |last2=Jain |first2=Chelsi |last3=Gupta |first3=Vivek |last4=Kataria |first4=Tushar |last5=Zhang |first5=Shuo |date=2023 |class=cs.CL }}</ref>]]
== Using Wikimedia projectsWikipedia for artificial intelligence ==
Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, inIn the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news |urldate=https://www.engadget.com/2017/-09/-01/google-perspective-comment-ranking-system/ |title=Google's comment-ranking system will be a hit with the alt-right |work=Engadget|dateurl=https://www.engadget.com/2017-/09-/01/google-perspective-comment-ranking-system/ |work=Engadget}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724" /><ref name="considerations" />
 
A 2012 paper reported that more than 1,000 academic articles, including those using artificial intelligence, examine Wikipedia, reuse information from Wikipedia, use technical extensions linked to Wikipedia, or research communication about Wikipedia.<ref>{{cite journal |last1=Nielsen |first1=Finn Årup |date=2012 |title=Wikipedia Research and Tools: Review and Comments |journal=SSRN Working Paper Series |datelanguage=2012en |doi=10.2139/ssrn.2129874 |language=en |issn=1556-5068 |doi-access=free }}</ref> A 2017 paper described Wikipedia as the [[mother lode]] for human-generated text available for machine learning.<ref>{{cite journal |last1=Mehdi |first1=Mohamad |last2=Okoli |first2=Chitu |last3=Mesgari |first3=Mostafa |last4=Nielsen |first4=Finn Årup |last5=Lanamäki |first5=Arto |date=March 2017 |title=Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus |url=http://urn.fi/urn:nbn:fi-fe202003057304 |journal=Information Processing & Management |volume=53 |issue=2 |pages=505–529 |doi=10.1016/j.ipm.2016.07.003 |date=March 2017|s2cid=217265814 |url=http://urn.fi/urn:nbn:fi-fe202003057304 }}</ref>
A 2016 research project called "One Hundred Year Study on Artificial Intelligence" named Wikipedia as a key early project for understanding the interplay between artificial intelligence applications and human engagement.<ref>{{cite web |title=AI Research Trends - One Hundred Year Study on Artificial Intelligence (AI100) |url=https://ai100.stanford.edu/2016-report/section-i-what-artificial-intelligence/ai-research-trends |website=ai100.stanford.edu |language=en}}</ref>
 
A 2016 research project called "One Hundred Year Study on Artificial Intelligence" named Wikipedia as a key early project for understanding the interplay between artificial intelligence applications and human engagement.<ref>{{cite web |title=AI Research Trends - One Hundred Year Study on Artificial Intelligence (AI100) |url=https://ai100.stanford.edu/2016-report/section-i-what-artificial-intelligence/ai-research-trends |website=ai100.stanford.edu |language=en}}</ref>
There is a concern about the lack of [[Creative Commons license#Attribution|attribution]] to Wikipedia articles in large-language models like ChatGPT.<ref name="nyt180724">{{cite news |title=Wikipedia's Moment of Truth |url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html |access-date=29 November 2024 |work=New York Times |date=18 July 2023 }}</ref><ref>{{cite news |title=Wikipedia Built the Internet's Brain. Now Its Leaders Want Credit. |url=https://observer.com/2025/03/wikimedia-foundation-execs-speak-on-ai-scraping-attribution-and-wikipedias-future/ |access-date=2 April 2025 |work=Observer |date=28 March 2025|quote=Attributions, however, remain a sticking point. Citations not only give credit but also help Wikipedia attract new editors and donors. ” If our content is getting sucked into an LLM without attribution or links, that’s a real problem for us in the short term,”}}</ref> While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.<ref name="nyt180724"/>
 
There is a concern about the lack of [[Creative Commons license#Attribution|attribution]] to Wikipedia articles in large-language models like ChatGPT.<ref name="nyt180724" /><ref>{{cite news |title=Wikipedia's Moment of Truth |url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html |access-date=2928 NovemberMarch 2024 |work=New York Times |date=18 July 2023 }}</ref><ref>{{cite news2025 |title=Wikipedia Built the Internet's Brain. Now Its Leaders Want Credit. |url=https://observer.com/2025/03/wikimedia-foundation-execs-speak-on-ai-scraping-attribution-and-wikipedias-future/ |access-date=2 April 2025 |work=Observer |date=28 March 2025|quote=Attributions, however, remain a sticking point. Citations not only give credit but also help Wikipedia attract new editors and donors. ” If our content is getting sucked into an LLM without attribution or links, that’s a real problem for us in the short term,”}}</ref> While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.<ref name="nyt180724" />[[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png|thumb|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv |eprint=2211.04325 |class=cs.LG |first1=Pablo |last1=Villalobos |first2=Anson |last2=Ho |title=Will we run out of data? Limits of LLM scaling based on human-generated data |date=2022 |last3=Sevilla |first3=Jaime |last4=Besiroglu |first4=Tamay |last5=Heim |first5=Lennart |last6=Hobbhahn |first6=Marius}}</ref>]]
==See also==
{{Commons category|Wikimedia projects and AI}}