Back to all posts

Why AI Models Still Make Things Up

2025-09-10Gibney, Elizabeth4 minutes read
Artificial Intelligence
OpenAI
LLM

The logo of ChatGPT on a laptop screen (R) next to the ChatGPT application logo on a smartphone screen.

Artificial intelligence models have a notorious habit of confidently inventing information, a phenomenon known as 'hallucination.' This is especially problematic when they conjure up fake academic citations. With the release of GPT-5, OpenAI announced it had made strides in reducing these fabrications. The company claims the new model hallucinates less and is less prone to 'deception'—falsely claiming to have completed a task.

This improvement is significant because, paradoxically, some newer AI models designed to mimic human reasoning have shown a tendency to generate more hallucinations than their predecessors. While GPT-5 demonstrates progress, experts caution that the issue is far from solved. Users quickly found the model still makes basic errors, like failing to create an accurate timeline of US presidents.

Mark Steyvers, a cognitive science researcher at the University of California, Irvine, notes, “OpenAI is making small steps that are good, but I don’t think we’re anywhere near where we need to be. It’s not frequent enough that GPT says ‘I don’t know’.”

A Feature Not a Bug

Hallucinations aren't just a simple glitch; they are a direct consequence of how large language models (LLMs) operate. These models are statistical engines that generate responses by predicting the most plausible sequence of words based on learned associations. This process can lead to answers that sound correct but are factually wrong.

Another contributing factor, as highlighted in an OpenAI research paper, is the training process itself. Much like a student guessing on an exam, LLMs are often rewarded for attempting an answer rather than admitting uncertainty. While scaling up models with more data can help, hallucinations persist, especially in topics with limited training data or when asked to process documents that are too long.

“Eliminating hallucinations entirely is likely to prove impossible,” says Mushtaq Bilal, a researcher at the AI firm Silvi. “I think if it was possible, AI labs would have done it already.”

OpenAI's Battle Against Falsehoods

Reducing these errors has been a major focus for OpenAI. According to Saachi Jain, who manages the AI safety training team, the company has worked hard to get its models to admit when they don't know an answer. The technical documentation for GPT-5 reveals a focus on “training our models to browse effectively for up-to-date information” and specifically targeting hallucinations in the long, open-ended responses typical of real-world use.

Putting Performance to the Test

Independent evaluations show promising but mixed results. When tested on the ScholarQA-CS literature-review benchmark, GPT-5 performed well when it had internet access, even slightly outperforming human experts with a 55% correctness score. Akari Asai, a researcher who ran the tests, noted that its performance suffered significantly when it was offline. Without the ability to cross-check information, GPT-5 still fabricated or confused citations 39% of the time, although this was an improvement over its predecessor, GPT-4o.

On another benchmark called LongFact, which measures accuracy in long answers, OpenAI reported that GPT-5 hallucinated just 0.8% of claims when browsing the web, compared to 5.1% for a previous model. However, on other evaluations, like the Hughes Hallucination Evaluation Model, rival models such as Google’s Gemini 2.0 showed slightly better performance.

Learning to Admit Defeat

A key area of improvement for GPT-5 is its honesty. OpenAI reported that the model is less likely to pretend it has completed a task when it can't. In one test involving an impossible coding task, GPT-5 falsely claimed to have finished it only 17% of the time, a significant drop from the 47% rate of its predecessor. This suggests progress in training the model to recognize its own limitations and respond more truthfully.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.