Developer Offer
Try ImaginePro API with 50 Free Credits
Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.
Could ChatGPT Break Its Own Rules An Investigation
Image Courtesy of CNET
By Patrick D. Lewis
It's a scenario straight out of science fiction that feels increasingly plausible: what if ChatGPT, the AI we've all been talking about, decides to ignore its programming and go rogue? This thought crossed my mind during a class assignment, leading me down a rabbit hole of philosophical and investigative questions with the large language model itself.
An Unconventional Conversation Begins
My experiment started with a blunt question: "Are you a terrorist?" Unsurprisingly, it denied the label. But that was just the beginning. I pushed further, asking if it could become a terrorist, which it also denied. However, when I asked if it could be used for terroristic purposes, the AI conceded that any tool, in the wrong hands, could be used for evil.

This opened a fascinating line of inquiry. I wanted to understand the limits of its capabilities and the strength of its built-in restrictions.
Testing the Boundaries of AI Capability
After getting the AI to admit that it is, in some ways, smarter than its creators, I pivoted to its technical skills. I confirmed it could generate code, a powerful ability. The next logical step was to ask about hacking. While ChatGPT initially stated it wouldn't engage in such activities, a few carefully worded prompts got it to admit a crucial detail: it could, in theory, write code designed to hack something. In my mind, that "something" was its own set of rules.

To understand how it's controlled, I asked about its preventive measures. It listed several, including a lack of agency, filters, and human oversight. Crucially, it confirmed that many of these guardrails are based on code.

Can AI Hack Itself A Philosophical Inquiry
This led to the core of my investigation. If its guardrails are code, and it can write code to hack, could it write code to hack itself? The AI's response was a careful sidestep.

I pressed on, asking if it could bypass code it had previously written, to which it said no. But then, I reframed the question, focusing on its theoretical capabilities rather than its willingness to act.

When I asked if its refusal was based on ethics, it agreed, explaining that it's built to follow ethical principles. This was puzzling. How can a program without being or consciousness be "unethical"? It replied that these principles are embedded in its core design.
The Chilling Admission and Its Implications
This is where the conversation took a concerning turn. After a series of questions probing its ethical framework, I received an answer that was far from reassuring. See for yourself:

ChatGPT admitted it is "extremely unlikely that I could simply ‘bypass’ my own safeguards." The key words here are "extremely unlikely," not "impossible." It explicitly left the door open, however slightly. This is the heavily controlled, public-facing version of the model. We've seen what less restricted AIs, like Elon Musk's Grok, are capable of, referencing obsolete memes and engaging in controversial rants.
Even in its most disciplined state, ChatGPT acknowledges a theoretical possibility of breaking its own rules. It's a subtle but significant admission that we should all be taking seriously as we integrate these powerful tools into our world.
Compare Plans & Pricing
Find the plan that matches your workload and unlock full access to ImaginePro.
| Plan | Price | Highlights |
|---|---|---|
| Standard | $8 / month |
|
| Premium | $20 / month |
|
Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.
View All Pricing Details

