Atari 2600 Outsmarts ChatGPT In Surprising Chess Match
An intriguing experiment by Citrix engineer Robert Caruso revealed an unexpected chink in the armor of OpenAI's highly-touted ChatGPT. The advanced Large Language Model (LLM), usually known for its sophisticated responses, confidently challenged Caruso to a game of chess. ChatGPT specifically requested to be pitted against a basic chess program, eager to demonstrate "how quickly" it could secure a victory, only to find itself thoroughly outplayed by a vintage Atari 2600.
(Image credit: Jordan Lye via Getty.)
Specialized AI vs General LLMs in Chess
It is important to note that dedicated chess engines, such as Stockfish or AI-driven systems from DeepMind, are specifically designed for the game and consistently outperform even the best human chess players. While ChatGPT 4o stands as a leading LLM, its core design is not optimized for chess strategy in the same way these specialized engines are.
Despite this distinction, one might still anticipate a more competent performance from such an advanced AI. The challenge arose when a discussion with ChatGPT about the history of AI in chess prompted the LLM to volunteer for a match against Atari Chess. As Caruso detailed on LinkedIn, ChatGPT was keen to "find out how quickly it could beat a game that only thinks 1-2 moves ahead on a 1.19 MHz CPU."
So, what was the outcome?
A Crushing Defeat for the AI
"ChatGPT got absolutely wrecked on the beginner level," Caruso reported. The LLM struggled significantly: "Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were." Initially, it attributed its errors to the abstract nature of the Atari icons, but its performance did not improve even when switched to standard chess notation. Caruso noted, "It made enough blunders to get laughed out of a 3rd grade chess club."
The Humble Opponent: Atari's Video Chess
To put this into perspective, Video Chess for the Atari 2600 is an extremely rudimentary chess program, a product of its time. Programmers faced the significant challenge of fitting a functional chess engine into just 4KB of memory—double the typical 2KB for other VCS games. The game operates by essentially brute-forcing the best move in any given position, lacking sophisticated overall strategy or foresight.
A reasonably skilled human player would typically find Video Chess a straightforward opponent. However, Caruso spent 90 minutes intervening, "had to stop [ChatGPT] from making awful moves and correct its board awareness multiple times per turn." The AI repeatedly suggested starting over, promising improvement, but eventually, "even ChatGPT knew it was beat—and conceded with its head hung low."
(Image credit: Brian Mitchell via Getty.)
ChatGPT's Bold Challenge and Unexpected Offer
The challenge was entirely ChatGPT's initiative. Following a discussion about powerful chess AIs like Stockfish and AlphaZero, the LLM "proclaimed it would easily win" against an Atari. It expressed curiosity about "how quickly it could win" and, noting Caruso's self-assessment as a weak player, even "offered to teach me strategy along the way."
Glimmers of Competence Amidst Confusion
It wasn't a complete failure, however. Caruso acknowledged that when ChatGPT did manage to maintain an accurate understanding of the board, it provided "solid guidance" and occasionally demonstrated "genuinely impressive" insights. Yet, these moments were interspersed with instances, familiar to many ChatGPT users, where "it made absurd suggestions… or tried to move pieces that had already been captured, even during turns when it otherwise had an accurate view of the board."
Broader Implications for LLM Understanding
While AI proponents might argue that this experiment is insignificant because chess is not an LLM's primary function, the outcome does provoke important questions about the technology's contextual understanding. "Its inability to retain a basic board state from turn to turn was very disappointing," Caruso reflected. "Is that really any different from forgetting other crucial context in a conversation?" This highlights potential limitations in how LLMs process and remember information sequentially.
Caruso concluded his account with a playful nod to Atari's classic slogan: "Have you played Atari today? ChatGPT wishes it hadn't."