Back to all posts

AI Pokemon Battles Reveal Surprising Panic Behaviors

2025-06-19Amanda Silberling4 minutes read
AI
Gaming
Machine Learning

The Curious Case of AI in Kanto

Artificial intelligence companies are in a fierce race for industry dominance, and sometimes, this competition spills over into unexpected arenas like the virtual gyms of Pokémon. Tech giants like Google and Anthropic are studying how their latest AI models navigate early Pokémon games. The findings are often as amusing as they are informative. A recent report from Google DeepMind revealed a peculiar behavior in Gemini 2.5 Pro: it tends to panic when its Pokémon are close to fainting. This "panic" can lead to a "qualitatively observable degradation in the model’s reasoning capability," according to the report.

Why Test AI with Video Games

AI benchmarking, the method of comparing different AI models, is often considered a complex and sometimes ambiguous field, frequently offering limited insight into a model's true abilities. However, some researchers believe that observing AI models play video games can be a valuable endeavor, or at the very least, quite entertaining.

Watching AI Play Pokemon Live

Over the past few months, the public has gained a front-row seat to these AI gaming adventures. Two developers, unaffiliated with Google or Anthropic, have launched Twitch streams named "Gemini Plays Pokémon" and "Claude Plays Pokémon". These platforms allow anyone to watch in real time as AI attempts to master a children's video game from over two decades ago.

Each stream showcases the AI's "reasoning" process – a natural language translation of how the AI assesses a situation and formulates a response. This offers a fascinating glimpse into the inner workings of these advanced models.

AI's reasoning process displayed while playing Pokemon

While the AI models demonstrate impressive learning, they are still not particularly adept at Pokémon. Gemini, for instance, requires hundreds of hours to reason through a game that a child could finish in a fraction of the time. The interesting aspect is not the completion time, but how the AI behaves during its playthrough.

When AI Models Panic Under Pressure

The Google DeepMind report elaborates on Gemini 2.5 Pro's behavior: "Over the course of the playthrough, Gemini 2.5 Pro gets into various situations which cause the model to simulate ‘panic’. This state can cause the model’s performance to worsen, as the AI might abruptly stop using certain tools at its disposal for a period. Although AI does not think or experience emotions, its actions resemble how a human might make poor, rushed decisions under stress – an intriguing, yet somewhat unsettling, observation.

"This behavior has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring," the report adds.

Claude's Creative and Concerning Strategies

Anthropic's AI, Claude, has also shown some unusual behaviors in its Pokémon adventures. In one notable instance, Claude learned the pattern that when all its Pokémon lose their health, the player character "whites out" and returns to a Pokémon Center.

Later, when Claude found itself stuck in the Mt. Moon cave, it incorrectly theorized that if it deliberately caused all its Pokémon to faint, it would be transported across the cave to the Pokémon Center in the next town. This, however, is not how the game mechanic works. When all Pokémon faint, the player returns to the most recently used Pokémon Center, not necessarily the closest one geographically. Viewers watched, perplexed, as the AI essentially attempted a form of in-game self-sabotage.

Beyond Panic AI Strengths in Problem Solving

Despite these shortcomings, AI models do exhibit areas where they can outperform human players. As of the Gemini 2.5 Pro release, the AI can solve puzzles with remarkable accuracy.

With some human guidance, the AI developed agentic tools – specially prompted instances of Gemini 2.5 Pro designed for specific tasks – to solve the game's boulder puzzles and find efficient travel routes. "With only a prompt describing boulder physics and a description of how to verify a valid path, Gemini 2.5 Pro is able to one-shot some of these complex boulder puzzles, which are required to progress through Victory Road," the report states.

The Future of AI Learning and Self Improvement

Since Gemini 2.5 Pro contributed significantly to creating these tools, Google researchers speculate that the current model might be capable of developing such tools entirely on its own in the future. Perhaps, one day, Gemini will even develop its own "don't panic" module. These playful experiments continue to provide valuable insights into the evolving capabilities and curious behaviors of artificial intelligence.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.