Artificial intelligence in Wikimedia projects: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type, title. Add: doi, chapter-url, class, date, eprint, bibcode, authors 1-6. Removed or converted URL. Removed access-date with no URL. Upgrade ISBN10 to 13. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:AI software | #UCB_Category 24/37
period after sentence
Line 20:
are in [[English Wikipedia]].<ref>{{cite arXiv |title=InfoSync: Information Synchronization across Multilingual Semi-structured Tables |eprint=2307.03313 |last1=Khincha |first1=Siddharth |last2=Jain |first2=Chelsi |last3=Gupta |first3=Vivek |last4=Kataria |first4=Tushar |last5=Zhang |first5=Shuo |date=2023 |class=cs.CL }}</ref>]]
===Generative models===
[[File:Wikipedia - Artificial intelligence in Wikimedia projects (spoken by AI voice).mp3|thumb|Wikipedia articles can be read using AI voice technology.]]
====Text====
In 2022, the public release of [[ChatGPT]] inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such [[large language model]]s are suitable for such purposes in light of their tendency to [[Hallucination (artificial intelligence)|generate plausible-sounding misinformation]], including fake references; to generate prose that is not encyclopedic in tone; and to [[Algorithmic bias|reproduce biases]].<ref>{{Cite web |last=Harrison |first=Stephen |date=2023-01-12 |title=Should ChatGPT Be Used to Write Wikipedia Articles? |url=https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html |access-date=2023-01-13 |website=Slate Magazine |language=en}}</ref><ref name ="vice"/> {{As of|2023|05}}, a draft Wikipedia policy on ChatGPT and similar [[large language model]]s (LLMs) recommended that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as the potential for [[libel]] or [[copyright infringement]].<ref name ="vice">{{cite news |last1=Woodcock |first1=Claire |title=AI Is Tearing Wikipedia Apart |url=https://www.vice.com/en/article/v7bdba/ai-is-tearing-wikipedia-apart |work=Vice |date=2 May 2023 |language=en}}</ref>
Line 28:
 
==Using Wikimedia projects for artificial intelligence==
[[File:Models of high-quality language data – (a) Composition of high-quality datasets - The Pile (left), PaLM (top-right), MassiveText (bottom-right).png|thumb|Datasets of Wikipedia are widely used for training AI models.<ref>{{cite arXiv |title=Will we run out of data? Limits of LLM scaling based on human-generated data |eprint=2211.04325 |last1=Villalobos |first1=Pablo |last2=Ho |first2=Anson |last3=Sevilla |first3=Jaime |last4=Besiroglu |first4=Tamay |last5=Heim |first5=Lennart |last6=Hobbhahn |first6=Marius |date=2022 |class=cs.LG }}</ref>]]
Content in Wikimedia projects is useful as a dataset in advancing artificial intelligence research and applications. For instance, in the development of the Google's [[Perspective API]] that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.<ref>{{Cite news|url=https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/|title=Google's comment-ranking system will be a hit with the alt-right|work=Engadget|date=2017-09-01}}</ref> Subsets of the Wikipedia corpus are considered the largest well-curated data sets available for AI training.<ref name="nyt180724"/><ref name="considerations"/>