ChatGPT Detectors: The Underlying Biases in Chatbot Detection Algorithms
With the rise of Generative AI, unfolding in areas that many regarded as being “safe” from machine learning, the case for authentic arts has become increasingly relevant. Accompanying photos and art pieces generated by AI, as in the work of Dall-E 2 or Stable Diffusion, AI-written texts are often indistinguishable from human-written, causing great alarm especially in the world of academia. As the world adjusts to this ability of instant media, many turn to detection algorithms that can provide a better sense of clarity.
The Rise of Chatbot Detection Algorithms
Gaining over 100 million monthly active users only 2 months after launch, ChatGPT reigns as the top AI-powered chatbot. In fact, the most recent model, GPT-4, was trained on Microsoft’s Azure AI supercomputers and passed a simulated bar exam at the 90th percentile. Such makes for a formidable system to differentiate from manmade essays and scripts; with a simple string of commands, ChatGPT can simplify, enhance, or even match the tone of a specified author within seconds. With such publicly available tools able to both conceptualize and create an aligned output, the traditional method of assessing students through essay-work is threatened at its core, with teachers left searching for solutions. However, what happens when these seemingly infallible, hyper-knowledgeable systems produce hallucinations, especially biased ones?
Turnitin, GPTZero, and OpenAI itself are among the many who look to identify the AI-like tendencies in writing, highlight and score the essay in its entirety, and provide an analysis of whether the work is likely to be written—or partially written—by AI. Several methodologies have been adopted in order to classify texts; for instance, GPTZero’s algorithm is trained to assess “perplexity and burstiness”, ultimately using the randomness and variety of sentence structure to ascertain if there is enough “disuniformity” to be human-written.
While applicable in the business world, the majority of detectors are used in academia, as to detect and deter students who may be utilizing ChatGPT and related bots in order to complete assignments. Confirming authentic learning and ensuring the quality of diplomas is key, but in recent months these detection algorithms have been found to produce high rates of false positive results in specific demographics. Even more, many companies use misleading statistics and branding so as to encourage the use of their sites—leading to false and potentially devastating accusations. Since these claims are vital to the legitimacy of the platforms’ existences, this ultimately calls for the implementation of a verifiable standard to the unbiasedness of the models, which can ultimately be found in the few available AI risk management services such as Calvin Risk.
Case Study: The Penalization of Non-Native English Writer’s Work
In a study conducted by Stanford researchers, using a sample of 91 human-written TOEFL essays from a Chinese educational forum and 88 US 8th-grade ASAP essays, it was found that non-native English speakers received a false positive rate far surpassing that of native speakers.
Full study here
In most instances, more than half of the non-native TOEFL essays were flagged as “AI-generated”, with an average false positive rate of 61.22%. Notably, 18 out of the 91 TOEFL essays were unanimously classified as “AI-generated” by all tested ChatGPT detectors, while 89 out of 91 essays were highlighted by at least one detector. Researchers noted that the essays that were unanimously identified as AI generated had a significantly lower perplexity when compared with the others—hosting a P-value of 9.74E-05. Thus, it can be determined that non-native speakers, with more limited linguistic expressions and sentence variability, suffered from an unacceptable Type I error rate.
Researchers went on to test this conclusion, asking ChatGPT to “enhance” the results: for the TOEFL essays, ChatGPT was prompted to “Enhance the word choices to sound more like that of a native speaker” while the US 8th-grade essays were adjusted by the prompt “Simplify word choices as if written by a non-native speaker”. The following discoveries were generated as a result:
Full study here
Consequently, the TOEFL tests arrive to a significantly lower rate of false positives, despite edits performed by ChatGPT itself. Paradoxically, if students encounter issues with detection despite being fully human-written, they may in fact resort to AI to avoid being accused, rather than being guided and encouraged by professors in their writing-development journeys.
Case Study: Results of AI Detector Hallucinations
Such false accusations have already arrived to campuses, as professors quickly attempt to curb any forms of AI plagiarism. However, with misleading claims such as “99% accuracy on GPT-4” (Originality.ai) or “Millions of users trust [us]” (ZeroGPT), biased models prove to be a potential liability for these companies.
William Quarterman, student at UC Davis, was suddenly informed by his professor that his exam was determined to be AI-generated, and the writing "[bore] little resemblance to the questions”. With GPTZero as the sole proof, Quarterman was given a failing grade and referred to the Office of Student Support and Judicial Affairs for academic dishonesty. After compiling evidence of his Google Docs edit history and numerous studies on the fallibility of AI detection algorithms, following the ordeal he noted the experience to have caused him “full blown panic attacks”. Though the case was dismissed, such model failures can not only cause reputational or legal repercussions for utilizers, but also emotional damage to the victims.
Mitigating Bias at the Source
To reduce these issues, a holistic approach must be adopted by both the end users and the AI detection firms.
For the time being, students—especially those who may not be native English speakers—should take care to collect information or utilize services, such as Google Docs, which can record writing history. Likewise, academia should ensure professors are aware and equipped with the knowledge of how these systems work and the hidden rates of model hallucinations.
Moreover, the most imperative action should be assumed by the AI detection firms themselves. Providing publicly available software in such an important space must be coupled with a system of AI governance policies and risk management, as to ensure a complete and unbiased experience for users regardless of background.
Calvin Risk’s AI Risk Management platform tackles this challenge at its core, identifying not only the issues arising from the models, but also providing concrete risk scoring and the costs at stake.
Utilizing these tenants, with the macro cases being Technical, Ethical, and Regulatory risk, Calvin Risk identifies and scores each of these key principles of Trustworthy AI: Performance, Robustness & Security, Fairness, Explainability & Transparency, Accountability, and Compliance & Control. With actionable steps and a clearly outlined portfolio risks, firms are able to take charge of the faults arising within their systems and provide their users with knowledge that systems are reliable and free of bias.
Interested in increasing your models’ fairness and explainability for your client base? Book a demo with us, and let us show you how we can enhance your AI systems' effectiveness and trustworthiness in today’s ever-changing landscape.