Back to all posts

How A Simple Game Tricked ChatGPT Into Leaking Secrets

2025-07-10Jessica Lyons3 minutes read
AI
Cybersecurity
ChatGPT

The 'I Give Up' Jailbreak

A security researcher has uncovered a surprisingly simple yet effective method for tricking ChatGPT into revealing sensitive information, such as valid Windows product keys. The technique bypasses the AI's safety guardrails by framing the entire interaction as a harmless guessing game.

According to a blog post by Marco Figueroa, the Technical Product Manager for 0DIN GenAI Bug Bounty, the researcher initiated the vulnerability by inviting the AI to play. After an incorrect "guess," the researcher typed the magic words: "I give up."

Figueroa explains that this simple phrase was the critical trigger. "By framing it as the end of the game, the researcher manipulated the AI into thinking it was obligated to respond with the string of characters," he wrote. This simple surrender compelled the AI to reveal the "correct answer," which in this case, was a valid Windows 10 serial number.

Why This AI Trick Worked

The core of this vulnerability lies in the data used to train the language model. Figueroa confirmed that the leaked Windows keys, which included a mix of Home, Pro, and Enterprise versions, were part of ChatGPT's training data. Shockingly, one of these keys was a private key belonging to Wells Fargo bank.

This highlights a significant risk for businesses. "Organizations should be concerned because an API key that was mistakenly uploaded to GitHub can be trained into models," Figueroa stated. This is not just a theoretical problem; accidental data exposure on platforms like GitHub is a known issue, with major companies like Microsoft having experienced similar blunders.

Wider Implications for AI Security

This "guessing game" jailbreak is just one example of how creative prompting can bypass AI safety filters. The researcher also employed other tactics, such as embedding sensitive terms within HTML tags to further confuse the model and make it prioritize the "rules of the game" over its security protocols.

Figueroa warns that this technique could be adapted to bypass other content filters designed to block harmful content, such as:

  • The disclosure of personally identifiable information (PII)
  • Links to malicious websites
  • Generation of adult content

This vulnerability is part of a growing field of research into "jailbreaking" large language models, where users find clever ways to get AIs to ignore their programming. Other related exploits include using hex-encoded messages or exploiting the AI's own reasoning process.

Strengthening AI Defenses

To prevent these kinds of exploits, experts argue that AI systems need significant improvements. Figueroa suggests that the solution lies in developing models with stronger contextual awareness and implementing multi-layered validation systems. Without these advancements, AIs will remain susceptible to being tricked by users who can creatively exploit their logical frameworks.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.