Wikipedia:WikiProject AI Cleanup/Guide
Main page | Discussion | Guide | Resources | Policies | Research |
This is a guide to finding and fixing AI-generated content on Wikipedia.
Spotting AI
editIdentifying AI-assisted edits is difficult in most cases since the generated text is often indistinguishable from human text. Some exceptions are if the text contains phrases like "as an AI model" or "as of my last knowledge update" and if the editor copy-pasted the prompt used to generate the text together with the AI response. Other indications include the presence of obvious AI hallucinations.
- AI content sometimes takes a promotional tone, reading like a tourism website.
- When missing more precise information, AI will often describe in detail very generic and common features, praising a village for its fertile farmlands, livestock and scenic countryside despite it being in an arid mountain range.
- Other times, the AI gets confused and will write about a hotel instead of a nearby village.
- AI often invents fake references, so check to see if the URLs work and the cited books exist.
- Example: the article Leninist historiography was entirely written by AI and previously included a list of completely fake sources in Russian and Hungarian at the bottom of the page. Google turned up no results for these sources.
- Other example: the article Estola albosignata, about a beetle species, had paragraphs written by AI sourced to actual German and French sources. While the sourced articles were real, they were completely off-topic, with the French one discussing an unrelated genus of crabs.
- Automatic AI detectors like GPTZero are unreliable and should only ever be used with caution. Given the high rate of false positives, deleting or tagging content purely because it was flagged by an automatic AI detector is not acceptable.
Style
edit- AI usually capitalizes every word in section titles (title case), which should instead be written in sentence case.
- A "bullet points with bold titles" style is very typical of ChatGPT, which is virtually unknown on Wikipedia. Often, the content of each bullet point will be a longer rewording of the bolded keyword preceding it.
- ChatGPT will often add a "Conclusion" section, usually arguing for the significance of the subject in a broader context. These sections do not add encyclopedic information, instead being more essay-like and subjective, and should not be present on Wikipedia.
Cleaning up
editArticles with AI-generated content
editThe category Category:Articles containing suspected AI-generated texts contains all articles tagged with the {{AI-generated}} template.
- When an article cites a reference that does not exist, remove the citation and the content that it is cited for.
- Remove citations of questionable sources, and either replace them with reliable sources or remove the content they are cited for.
- Ensure that the article text accurately summarizes the remaining cited sources in an encyclopedic tone.
- If an entire article (or draft) is obviously LLM-generated with no plausible human review, and the page is not worth keeping (e.g. if the topic is not notable), nominate it for speedy deletion under the G15 criterion. If the page is worth keeping, rewrite it or stubify it to remove LLM-generated content.
See the Signs of AI writing page for tips on identifying LLM-generated text. The Unreliable/Predatory Source Detector (UPSD) user script highlights citation links that contain a specific piece of text added by some AI chatbots.
Talk page discussions with LLM-generated messages
editIt is inappropriate to post LLM-generated messages in talk page discussions, especially without disclosing that they are LLM-generated.
- Use {{cait}} and {{caib}} to collapse discussions that are disruptive due to the use of LLM-generated text.
- Ask the editor who posted the LLM-generated message to express their argument in their own words without using an LLM.
- If the editor continues to post LLM-generated comments after being asked to stop using an LLM, report them to the incidents noticeboard.
Discussion comments also show signs of AI writing. Do not solely rely on AI content detection tools (such as GPTZero) to determine whether a message is LLM-generated, as these tools have high error rates.
Sources with AI-generated material
editSources produced by machine learning are considered unreliable, and should not be cited as sources in articles. The category All articles containing suspected AI-generated sources contains all articles tagged with the {{AI-generated source}} template.
- If a non-notable website that only contains AI-generated content is repeatedly being cited or linked to in articles, request for the website to be placed on the spam blacklist.
- Remove citations of AI-generated sources (and the content they are cited for) or tag them with the {{rs}} inline cleanup template.
- To inquire about the reliability of a source that incorporates AI-generated content, start a discussion on the reliable sources noticeboard.
Prominent media outlets that have adopted AI-generated content, such as multiple Red Ventures websites, have had their reliability reassessed after noticeboard discussions.
Warning editors
edit- {{Uw-ai1}}, {{Uw-ai2}}, {{Uw-ai3}}, {{Uw-ai4}} – for warning users.
- {{Uw-create-ai1}}, {{Uw-create-ai2}} – for new article creations. Use {{Uw-ai3}} for a higher level warning.