Artificial intelligence in Wikimedia projects: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:33, 25 August 2025 edit Pxldnky77 (talk \| contribs) 155 edits m →Reactions: Added source back which I forgot Tag: Visual edit ← Previous edit		Latest revision as of 09:59, 27 August 2025 edit undo Alalch E. (talk \| contribs) Extended confirmed users, New page reviewers, Rollbackers 35,363 edits in reality, there's no POV problem
(8 intermediate revisions by 4 users not shown)
Line 1: ~~{{Short description\|none}}~~ ~~{{Cleanup\|date=August 2025\|reason=The article contains exorbitant amounts of anti-AI info.}}~~ {{Short description\|none}} [[File:Example_of_AI-generated_article_getting_nominated_for_speedy_deletion.png\|thumb\|246x246px\|AI-generated draft article getting nominated for [[speedy deletion]] under G15 criteria]] Line 23 ⟶ 19: === Beginnings of generative AI === In 2022, the public release of [[ChatGPT]] inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such [[large language model]]s are suitable for such purposes in light of their tendency to [[Hallucination (artificial intelligence)\|generate plausible-sounding misinformation]], including fake references; to generate prose that is not encyclopedic in tone; and to [[Algorithmic bias\|reproduce biases]].<ref>{{Cite web \|last=Harrison \|first=Stephen \|date=2023-01-12 \|title=Should ChatGPT Be Used to Write Wikipedia Articles? \|url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html \|access-date=2023-01-13 \|website=Slate Magazine \|language=en}}</ref><ref name="vice">{{cite news \|last1=Woodcock \|first1=Claire \|date=2 May 2023 \|title=AI Is Tearing Wikipedia Apart \|url=https://www.vice.com/en/article/ai-is-tearing-wikipedia-apart/ \|work=Vice \|language=en}}</ref> Since 2023, work has been done to [[Wikipedia:Artificial intelligence#Discussion timeline\|draft Wikipedia policy on ChatGPT]] and similar [[large language model]]s (LLMs), e.g. at times recommending that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as noting the potential for [[libel]] or [[copyright infringement]].<ref name="vice" /> On December 6, 2022, a Wikipedia contributor named Pharos created the article "[[Artwork title]]" in his sandbox, declaring he used ChatGPT to experiment with it and would extensively modify it. Another editor tagged the article as "[[original research]]", arguing that the article was initially unsourced AI-generated content, and sourced afterwards, instead of being based on reliable sources from the outset. Another editor who experimented with this early version of ChatGPT said that ChatGPT's overview of the topic was decent, but that the citations were fabricated.<ref>{{Cite web \|last=Harrison \|first=Stephen \|date=January 12, 2023 \|title=Should ChatGPT Be Used to Write Wikipedia Articles? \|url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html \|website=[[Slate Magazine]]}}</ref> [[Wiki Education Foundation\|The Wiki Education Foundation]] reported that some experienced editors found AI to be useful in starting drafts or creating new articles. It said that ChatGPT “knows” what Wikipedia articles look like and can easily generate one that is written in the style of Wikipedia, but warned that ChatGPT had a tendency to use promotional language, among other issues.<ref>{{Cite web \|last=Ross \|first=Sage \|date=February 21, 2023 \|title=ChatGPT, Wikipedia, and student writing assignments \|url=https://wikiedu.org/blog/2023/02/21/chatgpt-wikipedia-and-student-writing-assignments/ \|website=[[Wiki Education Foundation]]}}</ref> Miguel García, a former Wikimedia member from Spain, said that when ChatGPT was originally launched, the number of AI-generated articles on the site peaked. He added that the rate of AI articles has now stabilized due to the community's efforts to combat it. He said that majority of the articles that have no sources are deleted instantly or are nominated for deletion.<ref>{{Cite web \|last=Bejerano \|first=Pablo G. \|date=August 10, 2024 \|title=How Wikipedia is surviving in the age of ChatGPT \|url=https://english.elpais.com/technology/2024-08-10/how-wikipedia-is-surviving-in-the-age-of-chatgpt.html \|access-date= \|website=[[El País]] \|language=en-us}}</ref>▼ In 2023, the Wikipedia community created a [[WikiProject]] named [[Wikipedia:WikiProject AI Cleanup\|AI Cleanup]] to assist in the removal of poor quality AI content from Wikipedia. On October 2024, a study by [[Princeton University]] revealed that about 5% of 3,000 newly created articles (created on August 2024) on [[English Wikipedia]] were created using AI. The study said that some of the AI articles were on innocuous topics and that AI had likely only been used to assist in writing. For some other articles, AI had been used to promote [[Business\|businesses]] or political interests.<ref name=":0">{{Cite news \|last=Wu \|first=Daniel \|date=August 8, 2025 \|title=Volunteers fight to keep 'AI slop' off Wikipedia \|url=https://www.washingtonpost.com/technology/2025/08/08/wikipedia-ai-generated-mistakes-editors/ \|access-date= \|newspaper=[[The Washington Post]] \|language=en-US \|issn=0190-8286}}</ref><ref>{{Cite web \|last=Stokel-Walker \|first=Chris \|date=November 1, 2024 \|title=One in 20 new Wikipedia pages seem to be written with the help of AI \|url=https://www.newscientist.com/article/2454256-one-in-20-new-wikipedia-pages-seem-to-be-written-with-the-help-of-ai/ \|url-access=subscription \|access-date= \|website=[[New Scientist]] \|language=en-US}}</ref> On August 2025, the Wikipedia community created a policy that allowed users to nominate suspected AI-generated articles for [[speedy deletion]]. Editors usually recognize AI-generated articles because they use citations that are not related to the subject of the article or fabricated citations. The wording of articles is also used to recognize AI writings. For example, if an article uses language that reads like an [[LLM]] response to a user, such as "Here is your Wikipedia article on” or “Up to my last training update”, the article is typically tagged for speedy deletion.<ref name=":020"~~>{{Cite news \|last=Wu \|first=Daniel \|date=August 8, 2025 \|title=Volunteers fight to keep 'AI slop' off Wikipedia \|url=https:~~/~~/www.washingtonpost.com/technology/2025/08/08/wikipedia-ai-generated-mistakes-editors/ \|access-date= \|newspaper=[[The Washington Post]] \|language=en-US \|issn=0190-8286}}</ref~~><ref>{{Cite web \|last=Maiberg \|first=Emanuel \|date=August 5, 2025 \|title=Wikipedia Editors Adopt 'Speedy Deletion' Policy for AI Slop Articles \|url=https://www.404media.co/wikipedia-editors-adopt-speedy-deletion-policy-for-ai-slop-articles/ \|access-date= \|website=[[404 Media]] \|language=en}}</ref> Other signs of AI use include excessive use of [[em dashes]], overuse of the word "moreover", promotional material in articles that describes something as "breathtaking” and formatting issues like using curly [[Quotation mark\|quotation marks]] instead of straight versions. During the discussion on implementing the speedy deletion policy, one user, who is an article reviewer, said that he is “flooded non-stop with horrendous drafts” created using AI. Other users said that AI articles have a large amount of “lies and fake references” and that it takes a significant amount of time to fix the issues.<ref>{{Cite web \|last=Roth \|first=Emma \|date=August 8, 2025 \|title=How Wikipedia is fighting AI slop content \|url=https://www.theverge.com/report/756810/wikipedia-ai-slop-policies-community-speedy-deletion \|url-access=subscription \|archive-url=https://web.archive.org/web/20250810012316/https://www.theverge.com/report/756810/wikipedia-ai-slop-policies-community-speedy-deletion \|archive-date=August 10, 2025 \|access-date= \|website=[[The Verge]] \|language=en-US}}</ref><ref>{{Cite web \|last=Gills \|first=Drew \|date=August 8, 2025 \|title=Read this: How Wikipedia identifies and removes AI slop \|url=https://www.avclub.com/wikipedia-ai-slop-read-this \|access-date= \|website=[[AV Club]] \|language=en-US}}</ref>▼ ▲On December 6, 2022, a Wikipedia contributor named Pharos created the article "[[Artwork title]]" in his sandbox, declaring he used ChatGPT to experiment with it and would extensively modify it. Another editor tagged the article as "[[original research]]", arguing that the article was initially unsourced AI-generated content, and sourced afterwards, instead of being based on reliable sources from the outset. Another editor who experimented with this early version of ChatGPT said that ChatGPT's overview of the topic was decent, but that the citations were fabricated.<ref>{{Cite web \|last=Harrison \|first=Stephen \|date=January 12, 2023 \|title=Should ChatGPT Be Used to Write Wikipedia Articles? \|url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html \|website=[[Slate Magazine]]}}</ref> [[Wiki Education Foundation\|The Wiki Education Foundation]] reported that some experienced editors found AI to be useful in starting drafts or creating new articles. It said that ChatGPT “knows” what Wikipedia articles look like and can easily generate one that is written in the style of Wikipedia, but warned that ChatGPT had a tendency to use promotional language, among other issues.<ref>{{Cite web \|last=Ross \|first=Sage \|date=February 21, 2023 \|title=ChatGPT, Wikipedia, and student writing assignments \|url=https://wikiedu.org/blog/2023/02/21/chatgpt-wikipedia-and-student-writing-assignments/ \|website=[[Wiki Education Foundation]]}}</ref> Miguel García, a former Wikimedia member from Spain, said that when ChatGPT was originally launched, the number of AI-generated articles on the site peaked. He added that the rate of AI articles has now stabilized due to the community's efforts to combat it. He said that majority of the articles that have no sources are deleted instantly or are nominated for deletion.<ref>{{Cite web \|last=Bejerano \|first=Pablo G. \|date=August 10, 2024 \|title=How Wikipedia is surviving in the age of ChatGPT \|url=https://english.elpais.com/technology/2024-08-10/how-wikipedia-is-surviving-in-the-age-of-chatgpt.html \|access-date= \|website=[[El País]] \|language=en-us}}</ref> ▲On August 2025, the Wikipedia community created a policy that allowed users to nominate suspected AI-generated articles for [[speedy deletion]]. Editors usually recognize AI-generated articles because they use citations that are not related to the subject of the article or fabricated citations. The wording of articles is also used to recognize AI writings. For example, if an article uses language that reads like an [[LLM]] response to a user, such as "Here is your Wikipedia article on” or “Up to my last training update”, the article is typically tagged for speedy deletion.<ref name=":02">{{Cite news \|last=Wu \|first=Daniel \|date=August 8, 2025 \|title=Volunteers fight to keep 'AI slop' off Wikipedia \|url=https://www.washingtonpost.com/technology/2025/08/08/wikipedia-ai-generated-mistakes-editors/ \|access-date= \|newspaper=[[The Washington Post]] \|language=en-US \|issn=0190-8286}}</ref><ref>{{Cite web \|last=Maiberg \|first=Emanuel \|date=August 5, 2025 \|title=Wikipedia Editors Adopt 'Speedy Deletion' Policy for AI Slop Articles \|url=https://www.404media.co/wikipedia-editors-adopt-speedy-deletion-policy-for-ai-slop-articles/ \|access-date= \|website=[[404 Media]] \|language=en}}</ref> Other signs of AI use include excessive use of [[em dashes]], overuse of the word "moreover", promotional material in articles that describes something as "breathtaking” and formatting issues like using curly [[Quotation mark\|quotation marks]] instead of straight versions. During the discussion on implementing the speedy deletion policy, one user, who is an article reviewer, said that he is “flooded non-stop with horrendous drafts” created using AI. Other users said that AI articles have a large amount of “lies and fake references” and that it takes a significant amount of time to fix the issues.<ref>{{Cite web \|last=Roth \|first=Emma \|date=August 8, 2025 \|title=How Wikipedia is fighting AI slop content \|url=https://www.theverge.com/report/756810/wikipedia-ai-slop-policies-community-speedy-deletion \|url-access=subscription \|archive-url=https://web.archive.org/web/20250810012316/https://www.theverge.com/report/756810/wikipedia-ai-slop-policies-community-speedy-deletion \|archive-date=August 10, 2025 \|access-date= \|website=[[The Verge]] \|language=en-US}}</ref><ref>{{Cite web \|last=Gills \|first=Drew \|date=August 8, 2025 \|title=Read this: How Wikipedia identifies and removes AI slop \|url=https://www.avclub.com/wikipedia-ai-slop-read-this \|access-date= \|website=[[AV Club]] \|language=en-US}}</ref> Ilyas Lebleu, founder of WikiProject AI Cleanup, said that hethey and ~~his~~their fellow editors noticed a pattern of unnatural writing that ~~they~~could ~~managed to~~be ~~connect~~connected to ChatGPT. HeThey added that AI is able to mass-produce content that sounds real while being completely fake, leading to the creation of [[hoax]] articles on Wikipedia that hethey ~~was~~were tasked to delete.<ref>{{Cite web \|last=Maiberg · \|first=Emanuel \|date=October 9, 2024 \|title=The Editors Protecting Wikipedia from AI Hoaxes \|url=https://www.404media.co/the-editors-protecting-wikipedia-from-ai-hoaxes/ \|url-access=subscription \|access-date= \|website=[[404 Media]] \|language=en}}</ref><ref>{{Cite web \|last=Lomas \|first=Natasha \|date=October 11, 2024 \|title=How AI-generated content is upping the workload for Wikipedia editors \|url=https://techcrunch.com/2024/10/11/how-ai-generated-content-is-upping-the-workload-for-wikipedia-editors/ \|access-date= \|website=[[TechCrunch]] \|language=en-US}}</ref> Wikipedia created a guide on how to spot signs of AI-generated writing, titled "[[Wikipedia:Signs of AI writing\|Signs of AI writing]]".<ref>{{Cite web \|last=Clair \|first=Grant \|date=August 20, 2025 \|title=Wikipedia publishes list of AI writing tells \|url=https://boingboing.net/2025/08/20/wikipedia-publishes-list-of-ai-writing-tells.html \|access-date= \|website=[[Boing Boing]] \|language=en-US}}</ref> === Hoaxes and malicious AI use === In 2023, researchers discovered that ChatGPT frequently fabricates information and makes up fake articles for its users. At that time, a ban on AI was deemed "too harsh" by the community.<ref>{{Cite web \|last=Woodrock \|first=Claire \|date=May 2, 2023 \|title=AI Is Tearing Wikipedia Apart \|url=https://www.vice.com/en/article/ai-is-tearing-wikipedia-apart/ \|archive-url=https://web.archive.org/web/20241004054831/https://www.vice.com/en/article/ai-is-tearing-wikipedia-apart/ \|archive-date=October 4, 2024 \|website=[[Vice Magazine]]}}</ref><ref>{{Cite web \|last=Harrison \|first=Stephen \|date=August 24, 2023 \|title=Wikipedia Will Survive A.I. \|url=https://slate.com/technology/2023/08/wikipedia-artificial-intelligence-threat.html \|website=[[Slate Magazine]]}}</ref> AI was deliberately used to create various hoax articles on Wikipedia. For example, an in-depth 2,000-word article about an Ottoman fortress that never existed was found by Ilyas Lebleu and ~~his~~their team.<ref>{{Cite web \|last=Durpe \|first=Maggie \|date=October 10, 2024 \|title=Wikipedia Declares War on AI Slop \|url=https://futurism.com/the-byte/wikipedia-declares-war-ai-slop \|access-date= \|website=[[Futurism (website)\|Futurism]]}}</ref><ref>{{Cite web \|last=Funaki \|first=Kaiyo \|date=October 25, 2024 \|title=Wikipedia editors form urgent task force to combat rampant issues with recent wave of content: 'The entire thing was ... [a] hoax' \|url=https://www.thecooldown.com/green-business/ai-content-wikipedia-volunteers-editing/ \|website=TCD}}</ref> Another example showed an user adding AI-generated misinformation to an article on [[Estola albosignata\|''Estola albosignata'']], a species of beetle. The paragraph seemed normal but referenced an unrelated article.<ref>{{Cite web \|last=Nine \|first=Adrianna \|date=October 9, 2024 \|title=People Are Stuffing Wikipedia with AI-Generated Garbage \|url=https://www.extremetech.com/internet/people-are-stuffing-wikipedia-with-ai-generated-garbage \|access-date= \|website=[[ExtremeTech]] \|language=en}}</ref> AI has been used on Wikipedia to advocate for certain political viewpoints in articles covered by [[Contentious topics on Wikipedia\|contentious topic]] guidelines. One instance showed a banned editor using AI to engage in [[edit wars]] and manipulate [[Albanian history]]-related articles. Other instances included users generating articles about political movements or weapons, but dedicating the majority of the content to a different subject, such as by pointedly referencing [[JD Vance]] or [[Volodymyr Zelenskyy\|Volodymyr Zelensky]].<ref>{{Cite web \|last1=Brooks \|first1=Creston \|last2=Eggert \|first2=Samuel \|last3=Peskoff \|first3=Dennis \|date=October 7, 2024 \|title=The Rise of AI-Generated Content in Wikipedia \|url=https://arxiv.org/html/2410.08044v1 \|access-date= \|website=[[ArXiv]] \|language=en}}</ref> === Simple Article Summaries === In 2025, Wikimedia started testing a "Simple Article Summaries" feature which would provide AI-generated summaries of Wikipedia articles, similar to [[Google Search]]'s [[AI Overviews]]. The decision was met with immediate and harsh criticism from Wikipedia editors, who called the feature a "ghastly idea" and a "PR hype stunt." They criticized a perceived loss of trust in the site due to AI's tendency to [[Hallucination (artificial intelligence)\|hallucinate]] and questioned the necessity of the feature.<ref>{{Cite web\|url=https://arstechnica.com/ai/2025/06/yuck-wikipedia-pauses-ai-summaries-after-editor-revolt/\|title=“Yuck”: Wikipedia pauses AI summaries after editor revolt\|first=Ryan\|last=Whitwam\|date=June 11, 2025\|website=Ars Technica}}</ref> The negative criticism led Wikimedia to halt the rollout of Simple Article Summaries while hinting that they are still interested in how generative AI could be integrated into Wikipedia.<ref>{{Cite web\|url=https://techcrunch.com/2025/06/11/wikipedia-pauses-ai-generated-summaries-pilot-after-editors-protest/\|title=Wikipedia pauses AI-generated summaries pilot after editors protest\|first=Kyle\|last=Wiggers\|date=June 11, 2025}}</ref> ==Using artificial intelligence for other Wikimedia projects==▼ === Detox ===▼ Detox was a project by Google, in collaboration with the Wikimedia Foundation, to research methods that could be used to address users posting unkind comments in Wikimedia community discussions.<ref>{{Cite book \|title=Research:Detox - Meta \|url=https://meta.wikimedia.org/wiki/Research:Detox \|language=en}}</ref> Among other parts of the Detox project, the Wikimedia Foundation and [[Jigsaw (company)\|Jigsaw]] collaborated to use artificial intelligence for basic research and to develop technical solutions{{examples needed\|date=April 2023}} to address the problem. In October 2016 those organizations published "Ex Machina: Personal Attacks Seen at Scale" describing their findings.<ref>{{Cite book \|pages=1391–1399 \|doi=10.1145/3038912.3052591 \|arxiv=1610.08914\|year=2017 \|last1=Wulczyn \|first1=Ellery \|last2=Thain \|first2=Nithum \|last3=Dixon \|first3=Lucas \|title=Proceedings of the 26th International Conference on World Wide Web \|chapter=Ex Machina: Personal Attacks Seen at Scale \|isbn=9781450349130 \|s2cid=6060248 }}</ref><ref>{{cite web \|author1=Jigsaw \|title=Algorithms And Insults: Scaling Up Our Understanding Of Harassment On Wikipedia \|url=https://medium.com/jigsaw/algorithms-and-insults-scaling-up-our-understanding-of-harassment-on-wikipedia-6cc417b9f7ff \|website=Medium \|date=7 February 2017}}</ref> Various popular media outlets reported on the publication of this paper and described the social context of the research.<ref>{{cite news \|last1=Wakabayashi \|first1=Daisuke \|title=Google Cousin Develops Technology to Flag Toxic Online Comments \|url=https://www.nytimes.com/2017/02/23/technology/google-jigsaw-monitor-toxic-online-comments.html \|journal=The New York Times \|language=en \|date=23 February 2017}}</ref><ref>{{cite web \|last1=Smellie \|first1=Sarah \|title=Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment \|url=https://www.vice.com/en/article/wikipedia-jigsaw-google-artificial-intelligence/ \|website=Motherboard \|publisher=[[Vice Media]] \|language=en-us \|date=17 February 2017}}</ref><ref>{{cite web \|last1=Gershgorn \|first1=Dave \|title=Alphabet's hate-fighting AI doesn't understand hate yet \|url=https://qz.com/918640/alphabets-hate-fighting-ai-doesnt-understand-hate-yet/ \|website=Quartz \|date=27 February 2017}}</ref>▼ [[File:DeepL machine translation of English Wikipedia example.png\|thumb\|Machine translation software such as [[DeepL]] is used by contributors.<ref>{{cite journal \|last1=Costa-jussà \|first1=Marta R. \|last2=Cross \|first2=James \|last3=Çelebi \|first3=Onur \|last4=Elbayad \|first4=Maha \|last5=Heafield \|first5=Kenneth \|last6=Heffernan \|first6=Kevin \|last7=Kalbassi \|first7=Elahe \|last8=Lam \|first8=Janice \|last9=Licht \|first9=Daniel \|last10=Maillard \|first10=Jean \|last11=Sun \|first11=Anna \|last12=Wang \|first12=Skyler \|last13=Wenzek \|first13=Guillaume \|last14=Youngblood \|first14=Al \|last15=Akula \|first15=Bapi \|last16=Barrault \|first16=Loic \|last17=Gonzalez \|first17=Gabriel Mejia \|last18=Hansanti \|first18=Prangthip \|last19=Hoffman \|first19=John \|last20=Jarrett \|first20=Semarley \|last21=Sadagopan \|first21=Kaushik Ram \|last22=Rowe \|first22=Dirk \|last23=Spruit \|first23=Shannon \|last24=Tran \|first24=Chau \|last25=Andrews \|first25=Pierre \|last26=Ayan \|first26=Necip Fazil \|last27=Bhosale \|first27=Shruti \|last28=Edunov \|first28=Sergey \|last29=Fan \|first29=Angela \|last30=Gao \|first30=Cynthia \|last31=Goswami \|first31=Vedanuj \|last32=Guzmán \|first32=Francisco \|last33=Koehn \|first33=Philipp \|last34=Mourachko \|first34=Alexandre \|last35=Ropers \|first35=Christophe \|last36=Saleem \|first36=Safiyyah \|last37=Schwenk \|first37=Holger \|last38=Wang \|first38=Jeff \|title=Scaling neural machine translation to 200 languages \|journal=Nature \|date=June 2024 \|volume=630 \|issue=8018 \|pages=841–846 \|doi=10.1038/s41586-024-07335-x \|pmid=38839963 \|language=en \|issn=1476-4687\|pmc=11208141 \|bibcode=2024Natur.630..841N }}</ref><ref name="nyt180724">{{cite news \|date=18 July 2023 \|title=Wikipedia's Moment of Truth \|url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html \|access-date=29 November 2024 \|work=New York Times}}</ref><ref name="considerations">{{cite arXiv \|title=Considerations for Multilingual Wikipedia Research \|eprint=2204.02483 \|last1=Johnson \|first1=Isaac \|last2=Lescak \|first2=Emily \|date=2022 \|class=cs.CY }}</ref><ref>{{cite book \|last1=Mamadouh \|first1=Virginie \|title=Handbook of the Changing World Language Map \|date=2020 \|publisher=Springer International Publishing \|isbn=978-3-030-02438-3 \|pages=3773–3799 \|chapter-url=https://link.springer.com/referenceworkentry/10.1007/978-3-030-02438-3_200 \|language=en \|chapter=Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity\|doi=10.1007/978-3-030-02438-3_200 \|quote=Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents}}</ref> More than 40% of Wikipedia's active editors▼ are in [[English Wikipedia]].<ref>{{cite arXiv \|title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables \|eprint=2307.03313 \|last1=Khincha \|first1=Siddharth \|last2=Jain \|first2=Chelsi \|last3=Gupta \|first3=Vivek \|last4=Kataria \|first4=Tushar \|last5=Zhang \|first5=Shuo \|date=2023 \|class=cs.CL }}</ref>]]▼ == Using Wikipedia for artificial intelligence == In the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news \|date=2017-09-01 \|title=Google's comment-ranking system will be a hit with the alt-right \|url=https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/ \|work=Engadget}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724" /><ref name="considerations" /> Line 54 ⟶ 43: There is a concern about the lack of [[Creative Commons license#Attribution\|attribution]] to Wikipedia articles in large-language models like ChatGPT.<ref name="nyt180724" /><ref>{{cite news \|date=28 March 2025 \|title=Wikipedia Built the Internet's Brain. Now Its Leaders Want Credit. \|url=https://observer.com/2025/03/wikimedia-foundation-execs-speak-on-ai-scraping-attribution-and-wikipedias-future/ \|access-date=2 April 2025 \|work=Observer \|quote=Attributions, however, remain a sticking point. Citations not only give credit but also help Wikipedia attract new editors and donors. ” If our content is getting sucked into an LLM without attribution or links, that’s a real problem for us in the short term,”}}</ref> While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.<ref name="nyt180724" /> ▲==Using artificial intelligence for other Wikimedia projects== ▲=== Detox === ▲Detox was a project by Google, in collaboration with the Wikimedia Foundation, to research methods that could be used to address users posting unkind comments in Wikimedia community discussions.<ref>{{Cite book \|title=Research:Detox - Meta \|url=https://meta.wikimedia.org/wiki/Research:Detox \|language=en}}</ref> Among other parts of the Detox project, the Wikimedia Foundation and [[Jigsaw (company)\|Jigsaw]] collaborated to use artificial intelligence for basic research and to develop technical solutions{{examples needed\|date=April 2023}} to address the problem. In October 2016 those organizations published "Ex Machina: Personal Attacks Seen at Scale" describing their findings.<ref>{{Cite book \|pages=1391–1399 \|doi=10.1145/3038912.3052591 \|arxiv=1610.08914\|year=2017 \|last1=Wulczyn \|first1=Ellery \|last2=Thain \|first2=Nithum \|last3=Dixon \|first3=Lucas \|title=Proceedings of the 26th International Conference on World Wide Web \|chapter=Ex Machina: Personal Attacks Seen at Scale \|isbn=9781450349130 \|s2cid=6060248 }}</ref><ref>{{cite web \|author1=Jigsaw \|title=Algorithms And Insults: Scaling Up Our Understanding Of Harassment On Wikipedia \|url=https://medium.com/jigsaw/algorithms-and-insults-scaling-up-our-understanding-of-harassment-on-wikipedia-6cc417b9f7ff \|website=Medium \|date=7 February 2017}}</ref> Various popular media outlets reported on the publication of this paper and described the social context of the research.<ref>{{cite news \|last1=Wakabayashi \|first1=Daisuke \|title=Google Cousin Develops Technology to Flag Toxic Online Comments \|url=https://www.nytimes.com/2017/02/23/technology/google-jigsaw-monitor-toxic-online-comments.html \|journal=The New York Times \|language=en \|date=23 February 2017}}</ref><ref>{{cite web \|last1=Smellie \|first1=Sarah \|title=Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment \|url=https://www.vice.com/en/article/wikipedia-jigsaw-google-artificial-intelligence/ \|website=Motherboard \|publisher=[[Vice Media]] \|language=en-us \|date=17 February 2017}}</ref><ref>{{cite web \|last1=Gershgorn \|first1=Dave \|title=Alphabet's hate-fighting AI doesn't understand hate yet \|url=https://qz.com/918640/alphabets-hate-fighting-ai-doesnt-understand-hate-yet/ \|website=Quartz \|date=27 February 2017}}</ref> ▲[[File:DeepL machine translation of English Wikipedia example.png\|thumb\|Machine translation software such as [[DeepL]] is used by contributors.<ref>{{cite journal \|last1=Costa-jussà \|first1=Marta R. \|last2=Cross \|first2=James \|last3=Çelebi \|first3=Onur \|last4=Elbayad \|first4=Maha \|last5=Heafield \|first5=Kenneth \|last6=Heffernan \|first6=Kevin \|last7=Kalbassi \|first7=Elahe \|last8=Lam \|first8=Janice \|last9=Licht \|first9=Daniel \|last10=Maillard \|first10=Jean \|last11=Sun \|first11=Anna \|last12=Wang \|first12=Skyler \|last13=Wenzek \|first13=Guillaume \|last14=Youngblood \|first14=Al \|last15=Akula \|first15=Bapi \|last16=Barrault \|first16=Loic \|last17=Gonzalez \|first17=Gabriel Mejia \|last18=Hansanti \|first18=Prangthip \|last19=Hoffman \|first19=John \|last20=Jarrett \|first20=Semarley \|last21=Sadagopan \|first21=Kaushik Ram \|last22=Rowe \|first22=Dirk \|last23=Spruit \|first23=Shannon \|last24=Tran \|first24=Chau \|last25=Andrews \|first25=Pierre \|last26=Ayan \|first26=Necip Fazil \|last27=Bhosale \|first27=Shruti \|last28=Edunov \|first28=Sergey \|last29=Fan \|first29=Angela \|last30=Gao \|first30=Cynthia \|last31=Goswami \|first31=Vedanuj \|last32=Guzmán \|first32=Francisco \|last33=Koehn \|first33=Philipp \|last34=Mourachko \|first34=Alexandre \|last35=Ropers \|first35=Christophe \|last36=Saleem \|first36=Safiyyah \|last37=Schwenk \|first37=Holger \|last38=Wang \|first38=Jeff \|title=Scaling neural machine translation to 200 languages \|journal=Nature \|date=June 2024 \|volume=630 \|issue=8018 \|pages=841–846 \|doi=10.1038/s41586-024-07335-x \|pmid=38839963 \|language=en \|issn=1476-4687\|pmc=11208141 \|bibcode=2024Natur.630..841N }}</ref><ref name="nyt180724">{{cite news \|date=18 July 2023 \|title=Wikipedia's Moment of Truth \|url=https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html \|access-date=29 November 2024 \|work=New York Times}}</ref><ref name="considerations">{{cite arXiv \|title=Considerations for Multilingual Wikipedia Research \|eprint=2204.02483 \|last1=Johnson \|first1=Isaac \|last2=Lescak \|first2=Emily \|date=2022 \|class=cs.CY }}</ref><ref>{{cite book \|last1=Mamadouh \|first1=Virginie \|title=Handbook of the Changing World Language Map \|date=2020 \|publisher=Springer International Publishing \|isbn=978-3-030-02438-3 \|pages=3773–3799 \|chapter-url=https://link.springer.com/referenceworkentry/10.1007/978-3-030-02438-3_200 \|language=en \|chapter=Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity\|doi=10.1007/978-3-030-02438-3_200 \|quote=Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents}}</ref> More than 40% of Wikipedia's active editors ▲are in [[English Wikipedia]].<ref>{{cite arXiv \|title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables \|eprint=2307.03313 \|last1=Khincha \|first1=Siddharth \|last2=Jain \|first2=Chelsi \|last3=Gupta \|first3=Vivek \|last4=Kataria \|first4=Tushar \|last5=Zhang \|first5=Shuo \|date=2023 \|class=cs.CL }}</ref>]] == Reactions == In November 2023, Wikipedia co-founder [[Jimmy Wales]] said that AI is not a reliable source and that he is not going to use ChatGPT to write Wikipedia articles. In July 2025, he proposed the use of LLMs to provide customized default feedback when drafts are rejected.<ref>{{Cite web \|last=Maiberg \|first=Emanuel \|date=Aug 21, 2025 \|title=Jimmy Wales Says Wikipedia Could Use AI. Editors Call It the 'Antithesis of Wikipedia' \|url=https://www.404media.co/jimmy-wales-wikipedia-ai-chatgpt/ \|website=404 Media}}</ref> [[Wikimedia Foundation]] product director Marshall Miller said that WikiProject AI Cleanup keeps the site's content neutral and reliable and that AI enables the creation of low-quality content. When interviewed by [[404 Media]], Ilyas Lebleu described speedy deletion as a "band-aid" for more serious instances of AI use, and said that the bigger problem of AI use will continue. HeThey also said that some AI articles are discussed for one week before being deleted.<ref>{{Cite web \|last=Crider \|first=Michael \|date=August 6, 2025 \|title=Wikipedia goes to war against AI slop articles with new deletion policy \|url=https://www.pcworld.com/article/2870079/wikipedia-goes-to-war-against-ai-slop-articles-with-new-deletion-policy.html \|access-date= \|website=[[PC World]] \|language=en}}</ref> [[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png\|thumb\|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv \|eprint=2211.04325 \|class=cs.LG \|first1=Pablo \|last1=Villalobos \|first2=Anson \|last2=Ho \|title=Will we run out of data? Limits of LLM scaling based on human-generated data \|date=2022 \|last3=Sevilla \|first3=Jaime \|last4=Besiroglu \|first4=Tamay \|last5=Heim \|first5=Lennart \|last6=Hobbhahn \|first6=Marius}}</ref>]]▼ {{Commons category\|Wikimedia projects and AI}}▼ == See also == * [[AI slop]] ▲[[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png\|thumb\|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv \|eprint=2211.04325 \|class=cs.LG \|first1=Pablo \|last1=Villalobos \|first2=Anson \|last2=Ho \|title=Will we run out of data? Limits of LLM scaling based on human-generated data \|date=2022 \|last3=Sevilla \|first3=Jaime \|last4=Besiroglu \|first4=Tamay \|last5=Heim \|first5=Lennart \|last6=Hobbhahn \|first6=Marius}}</ref>]] ~~==See also==~~ ▲{{Commons category\|Wikimedia projects and AI}} * [[:mw:ORES\|ORES Mediawiki page]] * [[Wikipedia:Artificial intelligence]]