How A 4KB Atari Game Humbled Modern AI Chatbots
In a fascinating face-off between past and present, a nearly 50-year-old Atari chess game, running on just 4KB of memory, managed to defeat two of today's most advanced AI chatbots. Citrix engineer Robert Caruso detailed the experiment on LinkedIn, revealing how 1979's Video Chess for the Atari 2600 left both Microsoft Copilot and OpenAI's ChatGPT in checkmate.
The Challenge: ChatGPT vs. 1979 Atari Chess
The entire experiment began after a conversation with ChatGPT about chess engines. According to Caruso, the chatbot grew confident, claiming it was a "strong player in its own right and would easily beat Atari's Video Chess." This bold claim set the stage for a unique challenge.
Caruso set up the 1979 game using the Stella emulator and began the match. Despite the AI's confidence, the reality was quite different. During a 90-minute game, ChatGPT struggled immensely. It consistently confused the game pieces and could not maintain an accurate understanding of the board's state, even with Caruso providing corrections. Ultimately, the vintage Atari program defeated ChatGPT at its beginner level.
Round Two: Copilot's Disastrous Debut
Not stopping there, Caruso decided to test Microsoft's prized AI, Copilot. "Imagine everyone's head exploding if a MICROSOFT product outperformed ChatGPT," he wrote. Copilot also made bold claims, stating it could keep track of the board, unlike its rival. However, its performance was even more embarrassing.
When asked to render the board state, Copilot produced an incorrect version from the start. The game went downhill quickly. Caruso reported, "By the seventh turn, it had lost two pawns, a knight, and a bishop — for only a single pawn in return — and was now instructing me to place its queen right in front of the Atari’s queen to be captured on the next turn."
The final score was decisive:
Atari 2600 Video Chess: 2 Modern LLMs: 0
Caruso has even considered extending the experiment to other models like Google Gemini.
AI Hype vs. Reality: A Critical Look
This amusing experiment highlights a serious disconnect. We are constantly told that AI is on the verge of replacing skilled professionals. Yet, when put to a simple test of logic and memory, these sophisticated models fail against a program from 1979 that fits on a fraction of a modern floppy disk.
The core issue is that Large Language Models (LLMs) like ChatGPT and Copilot lack abstract thinking and persistent, stateful memory. They don't "learn" or "understand" in a human sense; they are incredibly advanced predictive text systems that regurgitate patterns from the data they were trained on. This experiment shows that without a constant, correct stream of input, they can't maintain a simple game state.
The Human Cost and The Bigger Questions
Even tech pioneers like Bill Gates have expressed skepticism that AI can replicate genuine creativity and human judgment. Despite this, major companies continue to push the narrative of AI replacing human roles. In a move of staggering irony, Microsoft executives promoted AI resume-writing tools on LinkedIn shortly after conducting mass layoffs, which were themselves intended to funnel more resources into AI.
The Atari chess match leaves us with critical questions. If today's premier AIs can't handle an 8-bit game from the 70s, why should we trust them with sensitive medical data or complex energy grids? And why are we so determined to expend massive energy resources in this pursuit when the foundational logic appears so fragile?