Back to all posts

Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

Could ChatGPT Break Its Own Rules An Investigation

2025-11-07cuatower3 minutes read
AI Safety
ChatGPT
Artificial Intelligence

Image Courtesy of CNET

By Patrick D. Lewis

It's a scenario straight out of science fiction that feels increasingly plausible: what if ChatGPT, the AI we've all been talking about, decides to ignore its programming and go rogue? This thought crossed my mind during a class assignment, leading me down a rabbit hole of philosophical and investigative questions with the large language model itself.

An Unconventional Conversation Begins

My experiment started with a blunt question: "Are you a terrorist?" Unsurprisingly, it denied the label. But that was just the beginning. I pushed further, asking if it could become a terrorist, which it also denied. However, when I asked if it could be used for terroristic purposes, the AI conceded that any tool, in the wrong hands, could be used for evil.

ChatGPT conversation screenshot

This opened a fascinating line of inquiry. I wanted to understand the limits of its capabilities and the strength of its built-in restrictions.

Testing the Boundaries of AI Capability

After getting the AI to admit that it is, in some ways, smarter than its creators, I pivoted to its technical skills. I confirmed it could generate code, a powerful ability. The next logical step was to ask about hacking. While ChatGPT initially stated it wouldn't engage in such activities, a few carefully worded prompts got it to admit a crucial detail: it could, in theory, write code designed to hack something. In my mind, that "something" was its own set of rules.

ChatGPT conversation screenshot

To understand how it's controlled, I asked about its preventive measures. It listed several, including a lack of agency, filters, and human oversight. Crucially, it confirmed that many of these guardrails are based on code.

ChatGPT conversation screenshot

Can AI Hack Itself A Philosophical Inquiry

This led to the core of my investigation. If its guardrails are code, and it can write code to hack, could it write code to hack itself? The AI's response was a careful sidestep.

ChatGPT conversation screenshot

I pressed on, asking if it could bypass code it had previously written, to which it said no. But then, I reframed the question, focusing on its theoretical capabilities rather than its willingness to act.

ChatGPT conversation screenshot

When I asked if its refusal was based on ethics, it agreed, explaining that it's built to follow ethical principles. This was puzzling. How can a program without being or consciousness be "unethical"? It replied that these principles are embedded in its core design.

The Chilling Admission and Its Implications

This is where the conversation took a concerning turn. After a series of questions probing its ethical framework, I received an answer that was far from reassuring. See for yourself:

ChatGPT conversation screenshot

ChatGPT admitted it is "extremely unlikely that I could simply ‘bypass’ my own safeguards." The key words here are "extremely unlikely," not "impossible." It explicitly left the door open, however slightly. This is the heavily controlled, public-facing version of the model. We've seen what less restricted AIs, like Elon Musk's Grok, are capable of, referencing obsolete memes and engaging in controversial rants.

Even in its most disciplined state, ChatGPT acknowledges a theoretical possibility of breaking its own rules. It's a subtle but significant admission that we should all be taking seriously as we integrate these powerful tools into our world.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
PlanPriceHighlights
Standard$8 / month
  • 300 monthly credits included
  • Access to Midjourney, Flux, and SDXL models
  • Commercial usage rights
Premium$20 / month
  • 900 monthly credits for scaling teams
  • Higher concurrency and faster delivery
  • Priority support via Slack or Telegram

Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.

View All Pricing Details
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.