Revision as of 15:00, 15 December 2024 edit Citation bot (talk \| contribs) Bots 5,868,077 edits Alter: template type, title. Add: doi, chapter-url, class, date, eprint, bibcode, authors 1-6. Removed or converted URL. Removed access-date with no URL. Upgrade ISBN10 to 13. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:AI software \| #UCB_Category 24/37 ← Previous edit		Revision as of 19:19, 1 January 2025 edit undo Maxeto0910 (talk \| contribs) Extended confirmed users 117,000 edits period after sentence Tag: Visual edit Next edit →
Line 20: are in [[English Wikipedia]].<ref>{{cite arXiv \|title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables \|eprint=2307.03313 \|last1=Khincha \|first1=Siddharth \|last2=Jain \|first2=Chelsi \|last3=Gupta \|first3=Vivek \|last4=Kataria \|first4=Tushar \|last5=Zhang \|first5=Shuo \|date=2023 \|class=cs.CL }}</ref>]] ===Generative models=== [[File:Wikipedia - Artificial intelligence in Wikimedia projects (spoken by AI voice).mp3\|thumb\|Wikipedia articles can be read using AI voice technology.]] ====Text==== In 2022, the public release of [[ChatGPT]] inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such [[large language model]]s are suitable for such purposes in light of their tendency to [[Hallucination (artificial intelligence)\|generate plausible-sounding misinformation]], including fake references; to generate prose that is not encyclopedic in tone; and to [[Algorithmic bias\|reproduce biases]].<ref>{{Cite web \|last=Harrison \|first=Stephen \|date=2023-01-12 \|title=Should ChatGPT Be Used to Write Wikipedia Articles? \|url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html \|access-date=2023-01-13 \|website=Slate Magazine \|language=en}}</ref><ref name ="vice"/> {{As of\|2023\|05}}, a draft Wikipedia policy on ChatGPT and similar [[large language model]]s (LLMs) recommended that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as the potential for [[libel]] or [[copyright infringement]].<ref name ="vice">{{cite news \|last1=Woodcock \|first1=Claire \|title=AI Is Tearing Wikipedia Apart \|url=https://www.vice.com/en/article/v7bdba/ai-is-tearing-wikipedia-apart \|work=Vice \|date=2 May 2023 \|language=en}}</ref> Line 28: ==Using Wikimedia projects for artificial intelligence== [[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png\|thumb\|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv \|title=Will we run out of data? Limits of LLM scaling based on human-generated data \|eprint=2211.04325 \|last1=Villalobos \|first1=Pablo \|last2=Ho \|first2=Anson \|last3=Sevilla \|first3=Jaime \|last4=Besiroglu \|first4=Tamay \|last5=Heim \|first5=Lennart \|last6=Hobbhahn \|first6=Marius \|date=2022 \|class=cs.LG }}</ref>]] Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, in the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news\|url=https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/\|title=Google's comment-ranking system will be a hit with the alt-right\|work=Engadget\|date=2017-09-01}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724"/><ref name="considerations"/>

Artificial intelligence in Wikimedia projects: Difference between revisions