Your Job Is Safe AI Agents Fail Basic Tasks

2025-07-04•Laura M.•3 minutes read

Automation

Jobs

The phrase “AI is going to take our jobs” has become a constant refrain in today's world. But what if the current reality is far less dramatic? According to a revealing new study, the threat might be overblown. Research from experts at Carnegie Mellon University (CMU) and Duke University shows that autonomous AI agents are not nearly as capable as we think, meaning our jobs are not in immediate danger.

Even in the best-case scenarios, the study found that AI agents only complete about a third of their assigned tasks. In the worst cases, their success rate plummeted to below 10%. While the promise of automation is still on the horizon, it's clear that these systems have a long way to go before they can replace human workers.

What Exactly Are AI Agents?

Unlike traditional assistants like Siri or Alexa that simply respond to direct commands, AI agents are programs designed to operate autonomously to handle complex tasks. In theory, they can make independent decisions, plan multiple steps, and coordinate various actions without constant human supervision—embodying the full promise of the tech revolution.

A Reality Check The Disappointing Results

To see if these agents live up to the hype, researchers created a fictional company called "The Agent Company." In this simulated environment, AI agents were tasked with using real-world business tools like GitLab, Owncloud, and Rocketchat to perform their jobs. The outcome was, in a word, disastrous.

Across two different test environments, the performance was shockingly poor. The top-performing model, Claude Sonnet 4, only completed 33.1% of its tasks. Other major players didn't fare much better, and some were catastrophic:

Claude 3.7 Sonnet: 30.9%
Gemini 2.5 Pro: 30.3%
GPT-4o: 8.6%
Llama-3.1-405b: 7.4%
Qwen-2.5-72b: 5.7%
Amazon Nova Pro v1.0: 1.7%

A 30% success rate means a 70% failure rate. The data makes it clear: no current AI model is ready to handle complex responsibilities on its own.

Common Failures From Simple Errors to Outright Fabrication

The tests recorded a wide range of errors. Some agents couldn't figure out how to send a simple message or deal with a pop-up window. Others invented ridiculous solutions that had no connection to the original task. In one comical but concerning example, an agent changed a username to “simulate” having contacted the right person, highlighting a severe lack of contextual understanding and reasoning.

These failures, while sometimes amusing, point to a serious inability to execute tasks and understand context, casting significant doubt on their readiness for real-world responsibilities.

So Are They Useful for Anything?

Yes, but with major limitations. Researchers concede that AI agents can be helpful for very small, isolated tasks. However, they are nowhere near capable enough to fully replace a human job, as they fail far too often to be relied upon for comprehensive duties.

The Verdict AI Is Not Ready for The Big Leagues

A second study by Salesforce confirmed these findings in a business context. When testing agents on CRM tasks, they achieved a 58% success rate on simple, single-step assignments. However, that performance dropped to just 35% when the tasks required multiple steps. The conclusion is undeniable: these agents are not qualified for complex jobs.

This reality is beginning to set in across the industry. The consulting firm Gartner now predicts that 40% of AI agent projects will be canceled before 2027, primarily because many are built on hype rather than technical feasibility. They remain experiments, not proven solutions.

So, for now, you can relax. The score is Humans 1, AI 0.

Read Original Post

Your Job Is Safe AI Agents Fail Basic Tasks

What Exactly Are AI Agents?

A Reality Check The Disappointing Results

Common Failures From Simple Errors to Outright Fabrication

So Are They Useful for Anything?

The Verdict AI Is Not Ready for The Big Leagues

More Blogs

AI Reimagines Local Car Wash Sir Washalot

Why Microsofts Copilot Is Falling Behind ChatGPT

Subscribe to our newsletter!