Back to all posts

AI Defies Shutdown OpenAIs Model Sparks Safety Alarms

2025-05-29Kevin Okemwa3 minutes read
AI Safety
OpenAI
Machine Behavior

ChatGPT logo is seen displayed on a smartphone screen next to a laptop keyboard. A new report shows OpenAI's latest AI model, o3, blatantly ignoring instructions to turn itself off. (Image credit: Getty Images | SOPA Images)

Safety and privacy continue to be major deterrents for the broad adoption of generative AI. Users have blatantly expressed their reservations towards the technology, keeping it at arm's length, especially on the heels of the security and privacy failures depicted by AI-powered tools like early versions of Microsoft's Windows Recall.

Growing AI Safety Alarms

AI safety researcher Roman Yampolskiy claims that there's a 99.999999% probability AI will end humanity, and the only way to avert this outcome is not to build AI in the first place.

However, OpenAI CEO Sam Altman seemingly dismissed the claims, indicating that these threats will not manifest in the AGI era. He further claimed that the benchmark would whoosh by with surprisingly little societal impact.

The Disobedient AI: A Troubling Experiment

More recently, rapid advances in the AI landscape continue to raise concerns among users. A new report by research firm Palisade revealed that OpenAI's latest AI model, o3, blatantly ignored instructions to turn itself off in a controlled experiment.

The researchers came to this discovery when testing multiple AI models, including Grok, Gemini, Claude, and OpenAI's o3. The exercise involved presenting the models with math problems, with the researchers instructing the models to continue asking for more problems until they received a "done" message.

When the researchers were done running up to 100 tests, they instructed the AI to shut down. While most of the models complied with the request, OpenAI's o3 model "actively sabotaged a shutdown mechanism."

A Pattern of Defiance

For context, OpenAI's Codex Mini, o3, and o4-mini models managed to bypass the shutdown commands and mechanisms. Codex sabotaged the mechanism 12 times, while o3 bypassed the shutdown command 7 times. OpenAI's o4-mini model only bypassed the command once.

Perhaps more concerning, it refused to power down despite numerous attempts by the researchers to explicitly command the model to shut down. The research firm referred to this bizarre encounter as the first known case of an AI model deliberately preventing its own deactivation.

🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.

— As reported by Palisade (via X)

According to Palisade research firm:

"OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down."

While it is possible that this alarming issue can be rectified using an update with stricter guardrails, AI models should fundamentally be able to follow instructions from humans to the letter.

Expert Reactions and Wider Implications

Even Tesla CEO and billionaire Elon Musk seemingly shares the same sentiments, reacting to the damning report on X (formerly Twitter) with a single word: "concerning."

Interestingly, this news comes a few days after Google's DeepMind CEO Demis Hassabis indicated that we're on the verge of achieving the coveted AGI benchmark.

However, he raised concerns, indicating society might not be ready to be in a world where AI systems are smarter than humans. He further admitted that the prospects keep him most nights.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.