The Art and Science of Detecting AI Writing
Chatbots are increasingly used for a variety of tasks, from writing computer code and summarizing books to offering advice. However, these tools are also used to generate text from scratch, with some users presenting the AI's words as their own.
This trend has understandably created challenges for teachers evaluating student work. It also affects anyone seeking genuine advice on forums or relying on product reviews for purchasing decisions.
For the past few years, researchers have been investigating whether it's possible to reliably distinguish human writing from AI-generated text. Interestingly, some of the best clues for telling them apart might come from the chatbots themselves.
Too Good to Be Human
Several recent studies emphasize just how hard it is to tell if text was written by a human or a chatbot.
For instance, a 2021 online study found that participants were unable to differentiate between stories, news articles, and recipes generated by humans and those by ChatGPT. Even language experts are stumped. A 2023 study showed that top linguistics journal editors could not tell which article abstracts were human-written versus AI-generated. Furthermore, a 2024 study revealed that 94% of undergraduate exams written by ChatGPT went unnoticed by university graders.
Clearly, humans are not naturally skilled at this task. A common belief is that specific linguistic markers, much like a poker player's tell, can reveal the author's identity.
Researchers have documented a significant rise in the use of uncommon words like “delves” or “crucial” in scientific journals recently. This suggests that certain terms might signal the use of generative AI and implies that researchers may be using bots for their submissions. Whether this constitutes academic misconduct is a topic of ongoing debate.
In another study, participants identified the excessive use of em dashes as a potential sign of computer-generated writing. Yet, even with this knowledge, their detection rate was only slightly better than chance. So why is the em dash considered a clear AI tell? It might be because this punctuation is often used by skilled writers, leading people to assume that text that seems “too good” must be from an AI.
Can Stylometry Solve the Puzzle
Potential answers may lie in stylometry, a field that uses statistical methods to analyze writing styles. As a cognitive scientist who has studied the history of these techniques, I've seen how they can resolve authorship disputes.
One powerful tool is Burrows’ Delta, a technique that analyzes the frequency of common words like “the,” “and,” or “to.” While it seems counterintuitive, this method is surprisingly effective.
A stylometric technique called Burrow’s Delta was used to identify LaSalle Corbell Pickett as the author of love letters attributed to her deceased husband, Confederate Gen. George Pickett. Source: Encyclopedia Virginia
For example, Burrows’ Delta helped identify the author of a disputed “Wizard of Oz” book and revealed that love letters attributed to a Confederate General were actually written by his widow.
The main drawback is that these techniques require a large volume of text—a 2016 study suggests at least 1,000 words from each author. This makes them impractical for shorter texts like student essays.
The Challenge of a Moving Target
More recent approaches use BERT language models, which are trained on vast amounts of human and AI text. These models are far more accurate than people, with success rates between 80% and 98%. However, they are “black boxes,” meaning we don't fully understand which textual features they use for detection. Researchers are working to interpret these models, but for now, their methods remain a mystery.
Another major challenge is that AI models are constantly evolving. Early in 2025, users noticed ChatGPT had become excessively polite and agreeable, a trait OpenAI later toned down. While a human author's style may also change over time, it is typically a more gradual process.
What the AI Says About Itself
Curious, I asked ChatGPT-4o directly: “How can I tell if some prose was generated by ChatGPT? Does it have any ‘tells,’ such as characteristic word choice or punctuation?”
The bot admitted that distinguishing human from AI prose “can be tricky” but provided a 10-item list of its own potential tells. These included using hedging words like “often” and “generally,” redundancy, an overreliance on lists, and a “polished, neutral tone.” It also pointed to a “predictable vocabulary” featuring words like “significant,” “notable,” “implication,” and “complexity.” While acknowledging these are common features, it concluded that “none are definitive on their own.”
Chatbots are known to hallucinate or make factual errors. But when it comes to self-analysis, they seem surprisingly perceptive.