Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

Why AI Models Still Make Things Up

2025-09-10•Gibney, Elizabeth•4 minutes read

Artificial Intelligence

OpenAI

LLM

The logo of ChatGPT on a laptop screen (R) next to the ChatGPT application logo on a smartphone screen.

Artificial intelligence models have a notorious habit of confidently inventing information, a phenomenon known as 'hallucination.' This is especially problematic when they conjure up fake academic citations. With the release of GPT-5, OpenAI announced it had made strides in reducing these fabrications. The company claims the new model hallucinates less and is less prone to 'deception'—falsely claiming to have completed a task.

This improvement is significant because, paradoxically, some newer AI models designed to mimic human reasoning have shown a tendency to generate more hallucinations than their predecessors. While GPT-5 demonstrates progress, experts caution that the issue is far from solved. Users quickly found the model still makes basic errors, like failing to create an accurate timeline of US presidents.

Mark Steyvers, a cognitive science researcher at the University of California, Irvine, notes, “OpenAI is making small steps that are good, but I don’t think we’re anywhere near where we need to be. It’s not frequent enough that GPT says ‘I don’t know’.”

A Feature Not a Bug

Hallucinations aren't just a simple glitch; they are a direct consequence of how large language models (LLMs) operate. These models are statistical engines that generate responses by predicting the most plausible sequence of words based on learned associations. This process can lead to answers that sound correct but are factually wrong.

Another contributing factor, as highlighted in an OpenAI research paper, is the training process itself. Much like a student guessing on an exam, LLMs are often rewarded for attempting an answer rather than admitting uncertainty. While scaling up models with more data can help, hallucinations persist, especially in topics with limited training data or when asked to process documents that are too long.

“Eliminating hallucinations entirely is likely to prove impossible,” says Mushtaq Bilal, a researcher at the AI firm Silvi. “I think if it was possible, AI labs would have done it already.”

OpenAI's Battle Against Falsehoods

Reducing these errors has been a major focus for OpenAI. According to Saachi Jain, who manages the AI safety training team, the company has worked hard to get its models to admit when they don't know an answer. The technical documentation for GPT-5 reveals a focus on “training our models to browse effectively for up-to-date information” and specifically targeting hallucinations in the long, open-ended responses typical of real-world use.

Putting Performance to the Test

Independent evaluations show promising but mixed results. When tested on the ScholarQA-CS literature-review benchmark, GPT-5 performed well when it had internet access, even slightly outperforming human experts with a 55% correctness score. Akari Asai, a researcher who ran the tests, noted that its performance suffered significantly when it was offline. Without the ability to cross-check information, GPT-5 still fabricated or confused citations 39% of the time, although this was an improvement over its predecessor, GPT-4o.

On another benchmark called LongFact, which measures accuracy in long answers, OpenAI reported that GPT-5 hallucinated just 0.8% of claims when browsing the web, compared to 5.1% for a previous model. However, on other evaluations, like the Hughes Hallucination Evaluation Model, rival models such as Google’s Gemini 2.0 showed slightly better performance.

Learning to Admit Defeat

A key area of improvement for GPT-5 is its honesty. OpenAI reported that the model is less likely to pretend it has completed a task when it can't. In one test involving an impossible coding task, GPT-5 falsely claimed to have finished it only 17% of the time, a significant drop from the 47% rate of its predecessor. This suggests progress in training the model to recognize its own limitations and respond more truthfully.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram

Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.

View All Pricing Details

Try ImaginePro API with 50 Free Credits

Why AI Models Still Make Things Up

A Feature Not a Bug

OpenAI's Battle Against Falsehoods

Putting Performance to the Test

Learning to Admit Defeat

Compare Plans & Pricing

More Blogs

An AI Strategists Guide To Mastering Workplace AI

How AI Is Reshaping The Future Of Nursing

Subscribe to our newsletter!