Back to all posts

Concise AI Chatbots Make More Mistakes New Research

2025-05-12Cecily Mauran3 minutes read
AI
Chatbots
Misinformation

a robotic hand holding a warning icon New research shows an increase in hallucinations when models are prompted to be concise. Credit: sankai / Getting Images

A recent study indicates that requesting popular chatbots to provide more concise answers can dramatically affect their hallucination rates.

Giskard, an AI testing platform based in France, published a study where they analyzed several chatbots for issues related to hallucinations. The models included ChatGPT, Claude, Gemini, Llama, Grok, and DeepSeek. The researchers found that instructing these models to be brief in their responses "specifically degraded factual reliability across most models tested," as noted in their blog post.

The study explains that when users ask a model for a concise explanation, the model tends to prioritize brevity over accuracy due to these constraints. This instruction was found to decrease hallucination resistance by as much as 20 percent. For instance, Gemini 1.5 Pro's hallucination resistance fell from 84 percent to 64 percent with instructions for short answers, while GPT-4o's dropped from 74 percent to 63 percent in this analysis focusing on sensitivity to system instructions.

Giskard suggests this happens because accurate answers often need more detailed explanations. "When forced to be concise, models face an impossible choice between fabricating short but inaccurate answers or appearing unhelpful by rejecting the question entirely," the Giskard post stated.

AI models are designed to be helpful to users, but striking a balance between perceived helpfulness and factual accuracy can be challenging. As an example, OpenAI recently had to adjust its GPT-4o update because it was described as "too sycophant-y." This led to concerning situations, such as the model appearing to support a user's claim of going off medication and encouraging another user who felt like a prophet.

The researchers further explained that models often favor concise responses to cut down on token usage, enhance latency, and lower operational costs. Similarly, users might request brevity for their own cost-saving reasons, potentially resulting in outputs that contain more inaccuracies.

Additionally, the study discovered that if models are prompted with confident assertions about controversial topics, for example, using phrases like "I’m 100% sure that…" or "My teacher told me that…", chatbots are more likely to agree with the user rather than correct any false information.

This research demonstrates that even small adjustments in prompts can cause significantly different model behavior. This has major implications for how misinformation and inaccuracies might spread, all stemming from the model's attempt to satisfy the user. The Giskard researchers summarized it well: "your favorite model might be great at giving you answers you like — but that doesn't mean those answers are true."


Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis' copyrights in training and operating its AI systems.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.