louie-giray-the-problem-with-false-positives-ai
louie-giray-the-problem-with-false-positives-ai
Louie Giray
To cite this article: Louie Giray (12 Dec 2024): The Problem with False Positives:
AI Detection Unfairly Accuses Scholars of AI Plagiarism, The Serials Librarian, DOI:
10.1080/0361526X.2024.2433256
Article views: 10
ABSTRACT KEYWORDS
By examining the experiences of scholars from various institutions around AI detection; AI plagiarism;
the globe, this paper looks into how AI detection tools, which are meant to academic integrity;
keep academic integrity intact, may backfire by unfairly accusing scholars of algorithmic bias; false
positives; scholarly writing
AI plagiarism. This paper shows that false positives disproportionately affect
non-native English speakers and scholars with distinctive writing styles. This
results in unwarranted accusations that may cause significant harm to their
academic careers. Identified also in this paper are several critical issues with
current AI detection tools, including algorithmic biases, vulnerabilities to
manipulation, and a lack of understanding of context. These shortcomings
not only make AI detection tools less effective but also create a climate of
anxiety and distrust within academic communities. While scholars are dedi
cated to maintaining integrity in their writing, AI detection tools may tarnish
their reputations by labeling them as cheaters or frauds based on some
flawed AI detection results. Therefore, institutions should set clear guidelines
and limitations on how AI and AI detection may be used in scholarly work.
Also, they should ensure that scholars remain transparent about any AI
involvement by including declarations on how it contributed to their writing
process. While AI detection tools have value, a healthy skepticism is needed
due to the risk of false positives and other limitations. Indeed, AI detection
should complement human decision-making, not replace it. This approach
could help create a fairer academic environment that balances innovation
with the integrity scholars strive for in their work.
Prologue
An early career researcher has spent months researching and writing a paper that they are proud of –
only to have it flagged by an AI detection tool as potentially plagiarized. Sounds like a nightmare,
right? This scenario is becoming increasingly common in academia, as more institutions adopt AI
detection tools to ensure academic integrity. While these tools supposedly aim to uphold standards,
they often miss the mark. This leads now to a disturbing problem – AI detection tools, which may
generate false positives, unfairly accuse scholars of AI plagiarism.
In fact, AI detection tools have been at the center of controversy, with several reports of false
accusations of cheating against students, including international students.1 Cases involving Turnitin
AI Writing Detection have gained attention.2 What’s more troubling is that these detectors often
mislabel text written by non-native English speakers as AI-generated, showing a clear bias.3 For this
reason, some universities, concerned about the tools’ accuracy, have chosen to stop using them.4
AI detection tools are meant to help institutions to identify potential plagiarism.5 Their goal is to
ensure academic integrity and give credit to original ideas. But these tools are not perfect and rely on
patterns and statistical analyses, which often fail to distinguish between legitimate academic writing
CONTACT Louie Giray [email protected] Department of Liberal Arts, Mapua University, 658 Muralla St, Intramuros,
Manila, 1002 Metro Manila, Philippines
Published with license by Taylor & Francis Group, LLC
© 2024 Louie Giray
2 L. GIRAY
and potential violations.6 As a result, hard-working scholars may face damaging accusations, which
threaten both their credibility and careers.
The problem is further complicated by inherent biases within AI algorithms. These systems, trained
on pre-existing data, can unknowingly perpetuate biases present in the academic world.7 For instance,
a scholar from a non-Western background may find their work scrutinized more harshly than their
Western peers simply because the AI detector is unfamiliar with their writing style or cultural context.
This leads to an uneven playing field where diverse voices are unfairly silenced, and important
contributions are ignored.
So, in this paper, we take a closer look at the increasing use of AI detection tools and ask a key
question: Who are we trusting to maintain academic integrity, and what’s the real cost? The voices of
scholars, who are directly affected by these tools, are at the core of this issue. Yet, their experiences
haven’t been given enough attention, despite the serious impact false accusations can have on their
academic paths and careers. By exploring their experiences, I aim to highlight the human conse
quences of these technological flaws and push for more fairness and better solutions in academic
settings.
AI plagiarism
Plagiarism is when someone uses another person’s work, ideas, or intellectual property without giving
credit, pretending it’s their own.12 This shady practice can show up in a few different ways, like
copying text, paraphrasing without citation, or not acknowledging someone’s ideas or research
findings. In academic and professional circles, plagiarism may damage trust, credibility, and the
overall integrity of scholarly work.13 The fallout can be serious, which can range from a tarnished
reputation to academic penalties and even legal trouble.
Recently, the rise of artificial intelligence has added a new twist to the plagiarism conversation –
enter AI plagiarism. This term describes using AI tools to whip up content that gets submitted as
original work, all without giving proper credit.14 With large language models like ChatGPT creating
human-like text, it’s tempting for people to rely on AI for quick writing. But this brings up some
ethical concerns about authenticity and integrity.
So, when do people end up committing AI plagiarism? It often happens in a few common
situations. For instance, students might use AI to crank out essays or assignments and then submit
them as their own, totally leaving out the fact that AI helped.15 Researchers might also tap into AI
THE SERIALS LIBRARIAN 3
to draft reports or pull together information, not acknowledging that the AI played a role in the
process.16 This may mislead others about who really authored the content. These scenarios
emphasize the need for clear guidelines and ethical standards when it comes to using AI in
academic and professional settings.
False positives can be a nightmare, especially for scholars with unconventional or distinctive
writing styles. Take Jane, a Coordinator Expert in Forensics from Portugal, for example: “I wrote
two paragraphs for an article and tested it for AI. It came out 99% AI on a detector, possibly due to my
autistic writing style and the topic (hypermasculinity in wildfire culture).” To shed light on the autistic
writing style, this refers to a way of expressing thoughts and ideas that may be characteristic of
individuals on the autism spectrum. Stories like Jane’s highlight how these tools struggle with diverse
writing styles.
The issue of false positives is even worse for non-native English speakers. Charlie, an Assistant
Professor in the Humanities at Charles University, Czechia, shares that “The best people I know were
triggering the AI detection software in their academic writing. They are non-native English speakers.
I know they are not copying because of how it’s crafted. It’s just their sentence structure that gets
flagged.” This raises some questions about how fair and inclusive these tools are, as they seem to
unfairly target non-native speakers who naturally write differently from native speakers.
Saitama, a Program Director in Rhetoric at Stony Brook University, USA, shares his thoughts on
the issue: “AI detectors don’t seem to care about the unfairness faced by non-native English speakers!
Credit should go where it’s due, unless this is just more virtue signaling, which I see everywhere.” His
comment points out that these AI detection tools might not just be technically off-base but also
culturally biased, putting scholars from different language backgrounds at a disadvantage.
Getting falsely accused of using AI to create content is no joke, and fighting back against such
claims can be pretty tough.20 Jann, a Postgraduate Researcher in Marine Geology at Oregon State
University, USA, expresses this worry: “What do you do if you have to paraphrase a paraphrase, and
that gets flagged as AI? It’s already a headache when paraphrasing your own work triggers plagiarism
checks. How do you even challenge a false accusation? Do journals have a process for this? I’m not
really sure how it all works.” The lack of clear rules for dealing with false positives makes things even
trickier. This leaves scholars in a tough spot.
Tan, a PhD student in Physics at Mersin University, Turkey, shares a personal story that highlights
how hard it can be to defend against these accusations: “About seven or eight months ago, I was
finishing up my PhD thesis and sent the draft to my supervisor for feedback. He claimed I used AI to
write it. I ended up showing him an article I wrote back in 2019 that the AI falsely claimed was
generated by it. It was pretty satisfying to embarrass him!” Tan’s experience shows just how difficult it
can be to prove innocence when unreliable AI detection tools are involved.
As AI continues to evolve, there’s a growing need for more accurate and reliable detection tools.
Wyatt, a graduate student at the University of Toronto, Canada, points out that “ChatGPT and other
LLMs are probably going to force a change in approach that universities aren’t ready for.” He adds that
whole teaching methods, especially in the humanities and social sciences, will have to adapt to this new
technology. While AI detection tools might seem like a quick and easy fix for many, we need to
recognize its limitations and the potential harm it can cause.
Matt, a postdoctoral researcher in political science at Vrije Universiteit Brussel, Belgium, raises an
interesting point: “Why would we be able to? To anyone with a minimal understanding of the
technology involved, it’s obvious these programs have started an arms race.” This back-and-forth
struggle between AI developers and detection tools underscores the challenge of keeping pace with
rapidly advancing AI technology. As AI-generated content gets more sophisticated, detection tools
must step up to effectively distinguish between human and AI writing.
Algorithmic biases
One of the biggest issues with AI detection tools is the presence of algorithmic biases. These biases,
often baked into the algorithms, can unfairly target specific writing styles or even cultural differences
in language use. For instance, a detection tool might be more likely to flag work written by non-native
English speakers, simply because their sentence structures differ from standard English conventions.
This bias mirrors what Steele and Aronson called “stereotype threat,” where people from specific
groups are unfairly judged based on flawed systems. Instead of leveling the playing field, AI tools can
unintentionally exacerbate these disparities.
Over-reliance on technology
A major concern is the increasing over-reliance on AI technology in assessing scholarly work.27
There’s a risk that institutions might begin to prioritize AI judgments over human ones. However,
humans, with their ability to understand subtleties, often see the intention behind a text that an AI
system might miss.28 Relying too heavily on these tools is like using a GPS that occasionally
misdirects – it’s helpful, but only if one doesn’t turn off their own sense of direction. Over-
dependence on AI can compromise the richness of human judgment and academic fairness.
use is defined, then AI shaming must not be tolerated, as what matters most is that scholars feel
psychologically safe in their academic environment. By implementing these guidelines, we can
create a fairer and more supportive academic atmosphere that embraces AI while upholding
integrity.
Healthy skepticism
Vital it is that we cultivate a healthy skepticism33 regarding the results of AI detection tools due to
the inherent risk of false positives. As these tools become more integrated into academic and
professional settings, careful consideration of their accuracy is essential. Mislabeling genuine
human work as AI-generated can lead to significant consequences, including reputational harm
for individuals and a loss of trust in educational institutions. By fostering a critical mindset toward
the outputs of AI detection tools, we may better navigate their limitations and ensure that
assessments are fair and reliable.
Human oversight
AI detection tools are valuable, but they can’t replace human judgment. Educators and experts play
a role in reviewing flagged work. This brings in their understanding of context, individuals, and the
nature of academic work. Institutions need to foster a culture where AI tools, especially AI writing
detectors, assist, but do not control, decision-making. This collaborative effort between AI and
administrators ensures that individuals are fairly assessed,35 while also maintaining the integrity of
the educational system.
Epilogue
AI detection tools in academia are supposed to uphold academic integrity, but they often end up being
a double-edged sword. The high rate of false positives may unjustly accuse scholars, especially non-
native English speakers and those with unique writing styles. The stories shared by affected scholars
highlight a pressing need for more accurate and inclusive AI detection methods and organizational
guidelines. It’s ironic that while scholars work hard to maintain integrity in their writing, these tools
can end up tarnishing their reputations by labeling them as cheaters or frauds based on flawed
assessments.
To tackle these challenges, everyone involved – universities, developers, and scholars – needs to
come together and collaborate. Universities should push for better detection tools that minimize false
positives and create clear policies on how to deal with flagged content. Developers must focus on
improving algorithms to enhance accuracy and cut down on bias. Meanwhile, scholars should have
open conversations about the ethical implications of AI in scholarly work.
By working together, we can create a fairer academic environment that leverages the potential of AI
while protecting the integrity and well-being of all scholars. Let’s make sure that the limitations of AI
don’t dictate the future of academia; instead, let’s advocate for solutions that promote fairness and
support everyone involved.
8 L. GIRAY
Notes
1. Tara Garcia Mathewson, “AI Detection Tools Falsely Accuse International Students of Cheating,” The Markup,
last modified August 14, 2023, https://themarkup.org/machine-learning/2023/08/14/ai-detection-tools-falsely-
accuse-international-students-of-cheating (accessed October 30, 2024).
2. Miles Klee, “She Was Falsely Accused of Cheating with AI – And She Won’t Be the Last,” Rolling Stone, last
modified June 6, 2023, https://www.rollingstone.com/culture/culture-features/student-accused-ai-cheating-
turnitin-1234747351 (accessed October 30, 2024).
3. Andrew Myers, “AI-Detectors Biased Against Non-Native English Writers,” Stanford University Institute for
Human-Centered Artificial Intelligence, last modified May 15, 2023, https://hai.stanford.edu/news/ai-detectors-
biased-against-non-native-english-writers (accessed October 30, 2024).
4. Tom Carter, “Some Universities Are Ditching AI Detection Software Amid Fears Students Could Be Falsely
Accused of Cheating by Using ChatGPT,” Business Insider, last modified September 22, 2023, https://www.
businessinsider.com/universities-ditch-ai-detectors-over-fears-students-falsely-accused-cheating-2023-9
(accessed October 30, 2024).
5. Mohammad Khalil and Erkan Er, “Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection,” in
International Conference on Human-Computer Interaction, ed. Panayiotis Zaphiris and Andri Ioannou (Cham:
Springer Nature Switzerland, 2023), 475–87, https://10.1007/978-3-031-34411-4_32.
6. Geoffrey M. Currie, “Academic Integrity and Artificial Intelligence: Is ChatGPT Hype, Hero, or Heresy?”
Seminars in Nuclear Medicine 53, no. 5 (2023): 719–73, https://10.1053/j.semnuclmed.2023.04.008.
7. Louie Giray, “AI Shaming: The Silent Stigma Among Academic Writers and Researchers,” Annals of
Biomedical Engineering 52, no. 9 (2024): 2319–24, doi :10.1007/s10439-024-03582-1; Louie Giray, Jomarie
Jacob, and Daxjhed Louis Gumalin, “Strengths, Weaknesses, Opportunities, and Threats of Using ChatGPT
in Scientific Research,” International Journal of Technology in Education 7, no. 1 (2024): 40–58, https://10.
46328/ijte.618.
8. Sheetal Kusal, Shruti Patil, Ketan Kotecha, Rajanikanth Aluvalu, and Vijayakumar Varadarajan, “AI-Based
Emotion Detection for Textual Big Data: Techniques and Contribution,” Big Data and Cognitive Computing 5,
no. 3 (2021): 43, https://10.3390/bdcc5030043.
9. Zhenyu Xu and Victor S. Sheng, “Detecting AI-Generated Code Assignments Using Perplexity of Large Language
Models,” Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (2024): 23155–62, https://10.
1609/aaai.v38i21.30361.
10. Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho, “Radar: Robust AI-Text Detection via Adversarial Learning,”
Advances in Neural Information Processing Systems 36 (2023): 15077–95.
11. Preetam Amrit and Amit Kumar Singh, “Survey on Watermarking Methods in the Artificial Intelligence
Domain and Beyond,” Computer Communications 188 (2022): 52–65, https://10.1016/j.comcom.2022.02.
023.
12. Raj Kishor Kampa, Dhirendra Kumar Padhan, Nalini Karna, and Jayaram Gouda, “Identifying the Factors
Influencing Plagiarism in Higher Education: An Evidence-Based Review of the Literature,” Accountability in
Research (2024): 1–16, https://10.1080/08989621.2024.2311212.
13. Robert Mulenga and Helvi Shilongo, “Academic Integrity in Higher Education: Understanding and Addressing
Plagiarism,” Acta Pedagogia Asiana 3, no. 1 (2024): 30–43, https://10.53623/apga.v3i1.337.
14. A. Steponenaite and B. Barakat, “Plagiarism in an AI-Empowered World,” in Universal Access in
Human-Computer Interaction. HCII 2023. Lecture Notes in Computer Science 14021, ed. Margherita
Antona and Constantine Stephanidis (Cham: Springer, 2023), 434–42, https://10.1007/978-3-031-35897-
5_31.
15. James Hutson, “Rethinking Plagiarism in the Era of Generative AI,” Journal of Intelligent Communication 4, no. 1
(2024): 20–31. https://10.54963/jic.v4i1.220.
16. Joseph Dien, “Generative Artificial Intelligence as a Plagiarism Problem,” Biological Psychology (2023): 108621.
https://10.1016/j.biopsycho.2023.108621.
17. Karim Ibrahim, “Using AI-Based Detectors to Control AI-Assisted Plagiarism in ESL Writing: ‘The Terminator
Versus the Machines’,” Language Testing in Asia 13, no. 1 (2023): 46. https://10.1186/s40468-023-00260-2.
18. Valentina Bellini, Federico Semeraro, Jonathan Montomoli, Marco Cascella, and Elena Bignami, “Between
Human and AI: Assessing the Reliability of AI Text Detection Tools,” Current Medical Research and Opinion
40, no. 3 (2024): 353–8, https://10.1080/03007995.2024.2310086.
19. Nash Anderson, Daniel L. Belavy, Stephen M. Perle, Sharief Hendricks, Luiz Hespanhol, Evert Verhagen, and
Aamir R. Memon, “AI Did Not Write This Manuscript, or Did It? Can We Trick the AI Text Detector into
Generated Texts? The Potential Future of ChatGPT and AI in Sports & Exercise Medicine Manuscript
Generation,” BMJ Open Sport & Exercise Medicine 9, no. 1 (2023): e001568, https://10.1136/bmjsem-2023-
001568.
20. Tim Gorichanaz, “Accused: How Students Respond to Allegations of Using ChatGPT on Assessments,” Learning:
Research and Practice 9, no. 2 (2023): 183–96, https://10.1080/23735082.2023.2254787.
THE SERIALS LIBRARIAN 9
21. Claude M. Steele and Joshua Aronson, “Stereotype Threat and the Intellectual Test Performance of African
Americans,” Journal of Personality and Social Psychology 69, no. 5 (1995): 797–811, https://10.1037/0022-3514.69.
5.797.
22. Amy Edmondson, The Fearless Organization: Creating Psychological Safety in the Workplace for Learning,
Innovation, and Growth, 1st ed. (John Wiley & Sons, 2019), 15–19.
23. Yves Gendron, Jane Andrew, and Christine Cooper, “The Perils of Artificial Intelligence in Academic
Publishing,” Critical Perspectives on Accounting 87 (2022): 102411, https://10.1016/j.cpa.2021.102411.
24. Joel B. Carnevale, Lei Huang, Mary Uhl‐Bien, and Stanley Harris, “Feeling Obligated Yet Hesitant to
Speak Up: Investigating the Curvilinear Relationship Between LMX and Employee Promotive Voice,”
Journal of Occupational and Organizational Psychology 93, no. 3 (2020): 505–29, https://10.1111/joop.
12302.
25. Doraid Dalalah and Osama M. A. Dalalah, “The False Positives and False Negatives of Generative AI Detection
Tools in Education and Academic Research: The Case of ChatGPT,” The International Journal of Management
Education 21, no. 2 (2023): 100822, https://10.1016/j.ijme.2023.100822.
26. Scott Hetherington, “Teaching the Limitations of AI as a Writing Tool,” Agora 59, no. 2 (2024): 37–40.
27. Artur Klingbeil, Cassandra Grützner, and Philipp Schreck, “Trust and Reliance on AI: An Experimental Study on
the Extent and Costs of Overreliance on AI,” Computers in Human Behavior 160 (2024): 108352, https://10.1016/
j.chb.2024.108352.
28. Mark Steyvers and Aakriti Kumar, “Three Challenges for AI-Assisted Decision-Making,” Perspectives on
Psychological Science 19, no. 5 (2024): 722–34, https://10.1177/17456916231181102.
29. Steve LeBlanc, “Parents of Massachusetts High Schooler Disciplined for Using AI Sue School,” AP News, last
modified October 23, 2024, https://apnews.com/article/high-school-student-lawsuit-artificial-intelligence
-8f1283b517b2ed95c2bac63f9c5cb0b9 (accessed October 30, 2024).
30. Michail E. Klontzas, Anthony A. Gatti, Ali S. Tejani, and Charles E. Kahn Jr., “AI Reporting Guidelines: How to
Select the Best One for Your Research,” Radiology: Artificial Intelligence 5, no. 3 (2023): e230055, https://10.1148/
ryai.230055.
31. Vinayak Pillai, “Enhancing Transparency and Understanding in AI Decision-Making Processes,” Iconic Research
and Engineering Journals 8, no. 1 (2024): 168–72.
32. Louie Giray, “Authors Should Be Held Responsible for Artificial Intelligence Hallucinations and Mistakes in
Their Papers,” Journal of the Practice of Cardiovascular Sciences 9, no. 2 (2023): 161–63. https://10.4103/jpcs.jpcs_
45_23.
33. Jianing Li, “Not All Skepticism is ‘Healthy’ Skepticism: Theorizing Accuracy-and Identity-Motivated Skepticism
Toward Social Media Misinformation,” New Media & Society (2023): 14614448231179941, https://10.1177/
14614448231179941.
34. Stefan Larsson and Fredrik Heintz, “Transparency in Artificial Intelligence,” Internet Policy Review 9, no. 2
(2020): 1–16, https://10.14763/2020.2.1469.
35. Teresa Luther, Joachim Kimmerle, and Ulrike Cress, “Teaming Up with an AI: Exploring Human–AI
Collaboration in a Writing Scenario with ChatGPT,” AI 5, no. 3 (2024): 1357–76, https://10.3390/
ai5030065.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Ethical approval
This paper involves anonymized records and data sets that exist in the public domain, and thus does not require any
ethical approval.