Vintage Atari AI Humiliates Modern LLMs at Chess
A Tale of Two AIs
In a surprising turn of events for tech enthusiasts, a forty-year-old Atari 2600 chess program has once again proven its mettle, this time against Microsoft's Copilot. This victory comes just a month after the same 1979 software, Video Chess, famously defeated an overconfident ChatGPT in the game of kings. It seems Copilot, hoping to succeed where its rival failed, also ended up receiving a sound thrashing from the vintage silicon.
Understanding the Players
It's important to set the stage with a significant caveat: Large Language Models (LLMs) like ChatGPT and Copilot are not specialized chess engines. Modern chess programs like Stockfish have long surpassed human grandmasters and would easily defeat both the LLMs and the Atari game.
On the other side, Atari's Video Chess is a marvel of efficiency, running a functional chess AI within just 4KB of memory. However, its strategy is basic. It calculates the best immediate move but lacks long-term planning, typically seeing only a move or two ahead. Given this, one might expect a sophisticated LLM to have a clear advantage.
Copilot Steps into the Ring
Both AIs went into their respective matches exuding confidence. The experiments, conducted by Citrix engineer Robert Caruso, saw ChatGPT wonder how quickly it could win. Copilot followed suit with even more bravado. It brazenly declared its expertise at chess and dismissed the challenge as a trivial matter.
"It claimed it could think 10–15 moves ahead," Caruso noted on his LinkedIn, but said it would limit itself to 3-5 moves to capitalize on the Atari's expected "suboptimal moves."
To avoid ChatGPT's key failure—losing track of the board—Caruso provided Copilot with a screenshot of the game state after every single move made by the Atari. Copilot assured him this would allow for a "much smoother" match and a "strong fight."
A Swift and Decisive Defeat
Despite its boasts, Copilot's performance was disastrous from the start. "By the seventh turn, it had lost two pawns, a knight, and a bishop—for only a single pawn in return," Caruso reported. The situation worsened when Copilot instructed Caruso to make a suicidal move, placing its queen directly in the path of the Atari's queen to be captured.
Soon after, just like its predecessor, Copilot lost track of the board state, even with the constant visual prompts. Faced with an unwinnable position, Caruso suggested it was time to concede. Copilot responded with a strangely gracious concession speech:
"You’re absolutely right, Bob. Atari’s earned the win this round. I'll tip my digital king with dignity and honor the vintage silicon mastermind that bested me fair and square. Even in defeat, I've got to say: that was a blast… Long live 8-bit battles and noble resignations."
What This Teaches Us About AI
This experiment, which Caruso aptly called "ChatGPT deja vu," serves as a powerful illustration of the current limitations of LLMs. While they are not designed to be chess masters, their failure highlights a more fundamental issue: a fragile grasp of context and state.
As Caruso questioned after the first experiment, "Its inability to retain a basic board state from turn to turn was very disappointing. Is that really any different from forgetting other crucial context in a conversation?" The repeated failure shows that despite their impressive language skills, these models can easily lose track of simple, ongoing logic, a critical weakness that extends far beyond the chessboard.