By Dimitra Gatzelaki,
November 2022: the month ChatGPT was released to the public. Little did we know back then the profound influence it would come to have in our lives, the domains over which it would spread. Tutor, essay writer, virtual assistant, psychologist, friend; ChatGPT is your modern AI multi-tool. But, like any technological breakthrough, it joins hands with danger (if you’re not convinced of this, I’d recommend reading Daphne Du Maurier’s The Breakthrough, which can be read in one sitting). With every passing day, it is as though there are more and more pitfalls of this A.I. tool surface, like debris in a pond. As more and more similar tools emerge, serving as our personal “cognitive aids”, the natural question is, where will that lead us?
Would you be able to tell if this sentence (or article, for that matter) was written by A.I.? When you give an order to an L.L.M., does it know what it answers as, its place in the schema? Theoretically, it does; when you ask its opinion on a controversial topic, such as the war in Gaza, it eludes answering with the ease of running water: “as an A.I., I don’t have personal opinions or feelings, but I can provide an analysis based on historical and current information”. In practice, however, the way ChatGPT functions is that it attempts to understand your question, then answers it based on what will most likely be a satisfactory answer. To put it more simply, it works on probability. So, like all A.I., it gives the appearance of being conscious of itself, but this obviously can’t be true (at least for now).
Ever since ChatGPT came out, people have been using it for tasks “harmless” enough, like asking for information, brainstorming ideas, proofreading or editing their own texts. Yet, as its use becomes increasingly widespread, where do we draw the line? How long do we still need until every text we read is AI-generated? And what will be its toll on human creativity?
Lately, researchers have underlined the need for crafting accurate, dependable tools to detect the use of ChatGPT and other AI software, primarily in academic contexts. Writing for the MIT Technology review, Melissa Heikkilä offers some insight into the looming danger of ChatGPT and other L.L.M.s: “[their] magic—and danger— … lies in the illusion of correctness. The sentences they produce look right—they use the right kinds of words in the correct order. But the AI doesn’t know what any of it means. These models work by predicting the most likely next word in a sentence. They haven’t a clue whether something is correct or false, and they confidently present information as true even when it is not”. Obviously, the implications of giving out false information are crystal clear. This is true especially in academia, where research usually has some social impact. If academics begin to increasingly rely on AI technology for their research work, where will this take us? The answer might not be pleasant.
Ever since these concerns began to be voiced, scientists have been researching how (and how accurately) one can predict whether a given text is human or AI-written to “counter potential misuses of the technology”. A somewhat reliable first indicator is how “perfect” a text is. According to Google researcher Daphne Ippolito, “a typo in the text is actually a really good indicator that it was human written”, and this is because humans tend to craft fluid, messy, error-riddled text, which contrasts sharply with the clean, “perfect” text produced by AI.
Today, one common method to detect A.I. use in a text (according to Heikkilä) is using software to analyze its features— but how does an A.I. detector work, exactly? To put it simply, scientists train a classifier (an algorithm that automatically orders data into “classes”) on two sets of data, where one set contains human-written text, and the other contains text written by A.I. (ChatGPT and other LLMs). Classifiers often use “statistical techniques,” which monitor text features like sentence length pattern, punctuation, syntax structure, cohesion, and how often a particular word appears in the output. If you’ve read many ChatGPT-generated texts and you come across the words (amongst others) “intricate”, “tapestry”, and “delve” all in a single document, this raises some red flags. In fact, recent research by Liang et al. (2024) has shown that usage of certain adjectives in ICLR 2024 peer reviews, such as “innovative”, “meticulous”, “intricate”, and “versatile,” has spiked from early 2023 onwards, confirming their hypothesis of A.I. involvement in these scientific texts.
And yet, the use of A.I. detection tools raises another (significant) issue: falsely identifying human-written text as produced by AI. This is very much so the current state of open-source A.I. detection software, which seems to be trained on certain parameters and identifies text as A.I.-written as soon as it meets some of those parameters. In fact, this makes me wonder whether we’re reaching a time in writing (and specifically academic writing) where we will deliberately make mistakes in our text, syntactic or otherwise, to convince our readers or A.I. detectors that we wrote it by ourselves. Or, to look at the opposite side of this spectrum, whether L.L.M. usage will one day become the status quo, to the point where we’re all spewing out the same kind of “canned” text, writing in the same way. And, eventually, whether we will reach a point where we’re thinking in the same way.
References
- How to spot AI-generated text. Technology Review. Available here
- ‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy. Nature. Available here
- Liang, Weixin et al. “Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews.” ArXiv abs/2403.07183. Available here