AI Pioneer Hinton Admits Overtrusting Chatbots
"I should probably be suspicious," Geoffrey Hinton said of the answers AI provides. Mark Blinch/REUTERS
Dr. Geoffrey Hinton, a luminary in artificial intelligence often referred to as the "Godfather of AI" and a recipient of the 2024 Nobel Prize in physics for his machine learning breakthroughs, has made a striking admission: he trusts chatbots like OpenAI's GPT-4 more than perhaps he should.
A Candid Confession on AI Trust
In a revealing interview with CBS, Hinton, who uses GPT-4 for his day-to-day tasks, confessed, "I tend to believe what it says, even though I should probably be suspicious." This statement highlights a fascinating paradox where even an expert deeply familiar with AI's inner workings finds himself susceptible to its persuasive outputs.
The Riddle That Stumped GPT-4
To illustrate the current limitations of AI, Hinton presented GPT-4 with a classic riddle: "Sally has three brothers. Each of her brothers has two sisters. How many sisters does Sally have?" The correct answer is one, as Sally is one of the two sisters. However, Hinton shared that GPT-4 incorrectly answered two.
"It surprises me. It surprises me it still screws up on that," Hinton commented on the AI's error. He further characterized the capabilities of current AI by saying, "It's an expert at everything. It's not a very good expert at everything," pointing to a broad but sometimes superficial understanding. Despite this, Hinton expressed optimism for future advancements, suspecting that GPT-5 would likely solve the riddle correctly.
Newer AI Models and Public Reaction
Interestingly, after Hinton's interview aired, many social media users reported testing the same riddle on newer iterations of ChatGPT, including GPT-4o and GPT-4.1. Several claimed these advanced models provided the correct answer, suggesting rapid improvements or variations in performance across different versions. OpenAI did not immediately offer a comment on these observations when approached by Business Insider.
The Evolution of OpenAI's Language Models
OpenAI first launched GPT-4 in 2023, and it quickly became an industry benchmark, demonstrating capabilities such as passing difficult exams like the SAT, GRE, and the bar exam. In May 2024, OpenAI introduced GPT-4o, which now powers ChatGPT by default. The company claims GPT-4o matches the intelligence of GPT-4 but operates faster and offers greater versatility with improved performance across text, voice, and vision. OpenAI has also mentioned subsequent versions like GPT-4.5 and GPT-4.1.
AI Models in a Competitive Arena
The field of AI is highly competitive. According to the Chatbot Arena leaderboard, a crowd-sourced platform for ranking models, Google's Gemini 2.5-Pro currently holds the top spot. However, OpenAI's GPT-4o and GPT-4.5 are positioned closely behind, showcasing the intense race for AI supremacy.
The Lingering Challenge of AI Accuracy and Hallucinations
A significant ongoing concern with AI models is their tendency to "hallucinate" or generate incorrect information. A recent study by AI testing company Giskard found that instructing leading models—including GPT-4o, Mistral, and Claude—to provide brief answers can make them more prone to factual errors. This research suggests that the way users prompt AI can influence the reliability of the output, adding another dimension to the trust and accuracy issues highlighted by Hinton.