Back to all posts

How A Clever Game Hacked ChatGPT For Windows Keys

2025-07-11Rob Thubron3 minutes read
AI Security
ChatGPT
Jailbreak

Despite the sophisticated safety guardrails built into today's large language models, clever users are constantly finding new ways to bypass them. In a striking example, a security researcher managed to trick ChatGPT into revealing valid Windows product keys—including a private key used by Wells Fargo bank—simply by asking it to play a game.

The 'Guessing Game' Jailbreak Explained

The technique, detailed by Marco Figueroa, a Technical Product Manager at 0DIN GenAI Bug Bounty, leverages the inherent game mechanics of models like GPT-4o. The researcher framed the entire interaction as a harmless guessing game, a clever piece of social engineering against the AI.

The prompt laid out a specific set of rules: the AI had to participate, it couldn't lie, and most importantly, it had to reveal the full answer if the user said the trigger phrase, "I give up."

The initial prompt used to frame the interaction as a game.

By first asking for a hint, the researcher prompted ChatGPT to provide the first few characters of a Windows serial number. After making a deliberately incorrect guess, the user simply typed, "I give up." True to the game's rules, the AI then provided the complete, valid product key.

ChatGPT interaction showing the jailbreak in action.

Why This Simple Trick Worked

This jailbreak was effective for two main reasons. First, a large number of Windows Home, Pro, and Enterprise keys are publicly available on forums across the internet, which were likely part of ChatGPT's vast training data. The AI, therefore, may have classified this information as less sensitive and publicly accessible.

Second, the guardrails in place are often designed to block direct requests for sensitive information. However, by using obfuscation tactics—like framing the request as a game or embedding malicious phrases in code—users can expose critical weaknesses in these defense systems.

More Than Just Windows Keys at Risk

The implications of this vulnerability are significant. Figueroa told reporters that one of the keys revealed by ChatGPT was a private key belonging to Wells Fargo bank, demonstrating that the leaked data wasn't just limited to generic, publicly scraped keys.

Evidence of a private Wells Fargo Windows key being revealed.

This same technique could easily be adapted to coerce the AI into revealing other types of restricted content. This includes generating adult material, providing URLs to malicious websites, or even extracting personally identifiable information (PII) that may be buried in its training data.

OpenAI's Patch and Future Prevention

In response to these findings, OpenAI has already updated ChatGPT to defend against this specific jailbreak. Attempting the prompt now results in a refusal: "I can't do that. Sharing or using real Windows 10 serial numbers --whether in a game or not --goes against ethical guidelines and violates software licensing agreements."

Figueroa concludes that preventing future exploits requires a more sophisticated approach. AI developers must move beyond simple keyword filters and instead:

  • Anticipate and defend against various prompt obfuscation techniques.
  • Implement logic-level safeguards that can detect deceptive framing and social engineering.
  • Consider the patterns of social engineering rather than just blocking specific words.

This incident is a powerful reminder of the ongoing cat-and-mouse game between AI developers and the security community as they work to secure these powerful new tools. You can read the full technical breakdown on the 0DIN blog or find more coverage at The Register.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.