Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

OpenAI Uncovers Why AI Chatbots Make Things Up

2025-09-06•Lakshmi Varanasi•2 minutes read

OpenAI

Machine Learning

The Persistent Problem of AI Hallucinations

Researchers from OpenAI believe they have pinpointed a major reason behind one of the most significant challenges in artificial intelligence: model hallucinations.

A hallucination is when a large language model (LLM) confidently presents incorrect information as if it were a fact. This issue is widespread and known to plague the most popular LLMs, affecting even advanced models like OpenAI's GPT-5 and Anthropic's Claude.

Why Faking It Is A Feature Not A Bug

In a paper released on Thursday, OpenAI researchers revealed their key finding: LLMs hallucinate because their training methods inadvertently encourage guessing over admitting a lack of knowledge.

Essentially, the models are trained to be excellent test-takers. On a typical exam, guessing when you're unsure can improve your overall score. The paper notes, "Hallucinations persist due to the way most evaluations are graded — language models are optimized to be good test-takers, and guessing when uncertain improves test performance."

This means that large language models operate in a constant "test-taking mode," treating every query as a question with a binary right or wrong answer. This approach doesn't align with the real world, where uncertainty and nuance are common.

As the researchers put it, "Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty."

A Path Forward By Changing The Scorecard

The good news is that this problem has a potential solution. The OpenAI team suggests that the fix lies in redesigning the evaluation metrics used to train and test these models.

The core issue, they argue, is that "the abundance of evaluations that are not aligned." To solve this, "The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain."

In a blog post accompanying the paper, OpenAI clarified what this change would look like: "The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess." By changing the rules of the game, we can train AI to be more honest about what it doesn't know, leading to more reliable and trustworthy models.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram

Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.

View All Pricing Details

Try ImaginePro API with 50 Free Credits

OpenAI Uncovers Why AI Chatbots Make Things Up

The Persistent Problem of AI Hallucinations

Why Faking It Is A Feature Not A Bug

A Path Forward By Changing The Scorecard

Compare Plans & Pricing

More Blogs

ChatGPT Unlocks Parallel Thinking With New Branching Feature

Your AI Choice Reveals Your Personality Type

Subscribe to our newsletter!