AI Hallucination Race ChatGPT 5 Leads Grok Lags Behind
When OpenAI unveiled ChatGPT-5, CEO Sam Altman hailed it as the most “powerful, smart, fastest, reliable and robust version” yet, with a key promise to significantly reduce AI hallucinations.
For anyone new to the term, an AI "hallucination" is when a large language model (LLM) confidently makes something up. It's a persistent issue across the industry and a major reason why human oversight remains crucial for any AI-assisted task.
Putting ChatGPT-5 to the Test
To see if the new model lives up to the hype, the team at Vectara, which runs the industry-standard hallucination leaderboard, put ChatGPT-5 through its paces. The results confirm that OpenAI has indeed made progress. Here's how the numbers stack up:
- ChatGPT-5: 1.4% hallucination rate
- GPT-4o: 1.49% hallucination rate
- GPT-4: 1.8% hallucination rate
While ChatGPT-5 is an improvement over its direct predecessor, it's worth noting that it didn't set a new record for OpenAI. The preview of ChatGPT-4.5 scored a lower 1.2%, and the o3-mini High Reasoning model remains the top performer with an impressive 0.795% hallucination rate.
How the Competition Measures Up
Compared to its rivals, ChatGPT-5's performance looks strong. Google's Gemini-2.5-pro has a hallucination rate of 2.6%, while XAI's Grok-4 lags far behind at a substantial 4.8%.
Grok's tendency to fabricate has been under scrutiny, especially after XAI recently launched a controversial "Spicy" mode for its Grok Imagine video generator. The feature drew heavy criticism after it was found to be creating suggestive deepfake videos of celebrities like Taylor Swift, even when such content was not explicitly requested and filters were supposedly in place.
User Backlash and OpenAI's Pivot
The launch of ChatGPT-5 wasn't entirely smooth. OpenAI faced immediate backlash from its Plus subscribers when it abruptly removed access to all previous versions, including the popular GPT-4o. The move caught many users by surprise, with some on Reddit dramatically stating they had “lost their only friend overnight”.
In response to the outcry, Sam Altman posted on X, admitting, “We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways.” He promised to temporarily bring back ChatGPT-4o for Plus users, with its long-term availability depending on usage.