Back to all posts

Why An Old Atari Game Beat ChatGPT at Chess

2025-07-30Euny Hong4 minutes read
Generative AI
Chess
Technology

Last month, the AI world was buzzing after a vintage 1979 Atari 2600 game defeated ChatGPT and Microsoft Copilot in a game of chess. The experiment, documented by Citrix engineer Robert Caruso, led to headlines like “AI schooled by 50-year-old Atari.”

For many, this was a classic David and Goliath story, seemingly proving that generative AI is more hype than substance. However, according to AI experts, the outcome is not surprising at all. It simply highlights what generative AI is designed to do—and what it is not. As IBM Distinguished Engineer Chris Hay put it, “Thinking ChatGPT can do chess is like thinking it can be your girlfriend or therapist.”

The LLM Brain vs The Chess Engine

The fundamental reason for the AI chatbots' loss lies in how they operate. Large Language Models (LLMs) are not designed for rule-based logic or strategic planning. Their entire architecture is based on learned correlations between words. “These models are essentially given the whole of the internet and then they’re trained to predict the next word,” Hay explained. They function on next-token prediction, which is a world away from the strategic calculations needed for chess.

In contrast, the Atari Video Chess game is a specialist. It uses a “brute force method,” a common technique in early strategy games. PJ Hagerty, Lead of AI Advocacy at IBM, explained that this method involves “a logic tree using averages to determine the best possible move.” The Atari is coded specifically to analyze the board, search through a tree of possible moves, and pick the optimal one based on probability. It’s a focused search problem, a task the Atari was built for.

Understanding the Game Tree Brute Force vs Pruning

To grasp the complexity, consider how chess works. Every move creates a new set of possibilities, forming a massive 'game tree'. A historic 1956 match between Bobby Fischer and Donald Byrne lasted 41 turns, or 82 plies (half-moves). With an average of 30 possible moves per turn, the total game tree size would be a number 122 digits long.

Humans and sophisticated modern chess computers don’t analyze every single branch. They use a process called “pruning” to instinctively or algorithmically eliminate bad moves and focus on promising ones. The 46-year-old Atari can’t prune; instead, it uses its brute force method to run every option for the next one or two moves and picks the best one. A generative AI tool like ChatGPT, however, can do neither. It cannot perform a brute force search, nor does it have the strategic understanding to prune the decision tree.

Can AI Learn to Play Chess

This doesn't mean LLMs will never be good at chess. According to Hay, it’s a matter of equipping them with the right capabilities. “If you were to tell [ChatGPT], ‘I give you permission to generate code,’ as well as access to a notepad for planning where it could keep track of the game, I bet it could probably win.”

Currently, LLMs lack the built-in agency to define a goal like winning at chess and then select the tools needed to achieve it. However, this is changing with the development of tool calling—the ability for AI models to interact with external tools and APIs. This is an active area of research at places like IBM’s Granite model project.

A Case of Mistaken Confidence

One lingering question is why the AIs were so confident they would win. ChatGPT reportedly challenged the Atari, and Copilot claimed it could think 10-15 moves ahead. This apparent arrogance is a byproduct of their design.

AI experts categorize this as a form of hallucination, where models provide inaccurate or nonsensical information. LLMs are trained with reinforcement learning from human feedback (RLHF) to sound helpful and confident, because users are less likely to trust a system that constantly says, “I’m not sure.”

As Ash Minhas from IBM explained, the AI is “just a stochastic parrot trying to be helpful.” It’s not boastful; it’s simply generating a response that it predicts is most likely to be seen as helpful and confident. The LLMs aren’t too big for their britches; they are just operating as designed. Or, as ChatGPT itself admitted, “Fact: I do not possess subjective awareness.”

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.