Research:How LLMs impact knowledge production processes

This is an archived version of this page, as edited by Mr. Tamarize (talk | contribs) at 20:24, 17 December 2024 (Requesting deletion (XReport v2.3c)). It may differ significantly from the current version.
Created
2024 Dec
Contact
Collaborators
Soobin Soobin Cho
no affiliation
Duration:  2024-December – 2025-June

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


The advent of large language models (LLMs) has drastically transformed the landscape of knowledge production. While contributors used to craft content word by word, LLMs can now generate well-structured text with just a prompt, based on vast datasets of internet-sourced knowledge. As one of the largest peer production platforms, Wikipedia invites everyone to build up shared knowledge. The models’ ease of use and time-saving capabilities allows Wikipedia editors to use LLMs for their contributions, including article creation. Prior work finds evidence that there is a rise in AI-generated content in Wikipedia new articles, understands influences of LLMs to knowledge integrity, and investigates contributor perceptions on LLM- supported coordination. According to Wikipedia’s guidelines, though LLMs is not encouraged, it is also not entirely banned, with expectation of absolute transparency regarding their usage.

However, key questions remain unanswered:Understanding the impact of LLMs will benefit not only the ideology behind collaborative knowledge production, but also the downstream activities such as training AI models with established data. In this study, we seek to answer this broad question by investigating editors' interactions with LLMs in practice, specifically, we ask two questions: (1) How do Wikipedians perceive and use LLM tools in the knowledge production process? (2) What are the benefits and limitations of LLMs for future human AI collaboration in knowledge production? In this study, we seek to answer these questions by semi-structured interviews with Wikipedia editors who have utilized LLMs in their edits.

Methods

We call for participation in this study.

If you have used LLMs (e.g., GPT, Llama, Claude...) when you contribute to Wikipedia (eg. Editing Wikipedia articles with LLMs, using LLMs when interacting with other contributors), we’d love to join the study! You will be engaging in a 45-60 min interview, talking and reflecting about your experience with Wikipedia and your perception/usage of LLMs in Wikipedia. Your valuable input will not only help us understand practical ways to incorporate LLMs into the knowledge production process, but also help us generate guardrails about these practices. All participation would be anonymous.

To learn more and sign up, please visit https://umn.qualtrics.com/jfe/form/SV_bqIjhNRg9Zqsuvs, or if you have any questions, feel free to email zhou0972<at>umn<dot>edu.

Timeline

Beginning of December: Start recruitment & conduct interviews

Beginning of March: Finish interviews (Goal: 10-11 participants)

End of April: Finish data analysis


Policy, Ethics and Human Subjects Research

We were approved by IRB at the University of Minnesota on Nov 25, 2024 under STUDY00023793. At the same time, we, as the researchers, understand that using LLMs and talking about using them may seem to be taboo, therefore we ensure to the best extent by anonymous participation.


References

[1] Arazy, O. et al. 2017. On the “How” and “Why” of Emergent Role Behaviors in Wikipedia. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (New York, NY, USA, Feb. 2017), 2039–2051.

[2] Ashkinaze, J. et al. 2024. Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms. arXiv.

[3] Ayers, P. et al. 2008. How Wikipedia Works: And how You Can be a Part of it. No Starch Press.

[4] Beschastnikh, I. et al. 2008. Wikipedian Self-Governance in Action: Motivating the Policy Lens. Proceedings of the International AAAI Conference on Web and Social Media. 2, 1 (2008), 27–35. DOI:https://doi.org/10.1609/icwsm.v2i1.18611.

[5] Bipat, T. et al. 2021. Wikipedia Beyond the English Language Edition: How do Editors Collaborate in the Farsi and Chinese Wikipedias? Proc. ACM Hum.-Comput. Interact. 5, CSCW1 (Apr. 2021), 55:1-55:39. DOI:https://doi.org/10.1145/3449129.

[6] Brooks, C. et al. 2024. The Rise of AI-Generated Content in Wikipedia. arXiv.

[7] Bryant, S.L. et al. 2005. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work - GROUP ’05 (Sanibel Island, Florida, USA, 2005), 1.

[8] Butler, B. et al. 2008. Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2008), 1101–1110.

[9] Chou, H. et al. 2020. Understanding Open Collaboration of Wikipedia Good Articles. Social Computing and Social Media. Participation, User Experience, Consumer Experience, and Applications of Social Computing (Cham, 2020), 29–43.

[10] Daxenberger, J. and Gurevych, I. 2012. A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles. Proceedings of COLING 2012 (Mumbai, India, Dec. 2012), 711–726.