Back to all posts

The Broken Promises of AI Personal Assistants

2025-08-04Gary Marcus6 minutes read
Artificial Intelligence
AI Agents
Tech Trends

The Grand Promise of AI Agents

While chatbots are designed to answer your questions, AI agents were promised as the next leap forward—proactive assistants meant to manage your life and business. Imagine an AI that could shop for you, book travel, organize your calendar, and even maintain complex software systems. This was the future that, by the end of 2024, tech giants like Google, OpenAI, and Anthropic declared was just around the corner.

The industry's leaders set high expectations. Anthropic's Dario Amodei told the Financial Times that by 2025, AI would be capable of tasks typically handled by a PhD student. OpenAI's CEO Sam Altman echoed this sentiment in a blog post, predicting that AI agents would “join the workforce” in 2025. Google also announced its own Project Astra, aiming to build a universal AI assistant.

Tech commentators were quick to amplify the excitement.

A quote from tech columnist Kevin Roose about AI agents pushing the boundaries of what's possible.

But as we move through the year, a different question emerges: do they actually work? Skeptics, including the author of the original piece, predicted that AI agents would be hyped throughout 2025 but remain far from reliable except in very specific scenarios. With several months left, let's look at where things stand.

Hype vs. Reality: A Mid-Year Report Card

The big companies have indeed launched their agents. OpenAI released its ChatGPT agent, and Google's Astra is available to a waitlist of testers. The problem is that none of them appear to be reliable for mainstream use.

OpenAI admits as much, stating that while its agent can handle complex tasks, it “can still make mistakes” and “introduces new risks” because it works directly with user data. In practice, these errors have severely limited their usefulness, and the boundary-pushing future we were promised feels largely absent.

By March, the cracks were already showing. A Fortune story reported that “some AI agent customers say reality doesn’t match the hype,” a sentiment that has only intensified.

Recent reports paint a picture of widespread disappointment.

An image of a report showing AI agents are failing and the failure rate is painful.

Even AI enthusiasts are struggling to find value. One popular influencer recently posted on X, asking, “serious question: do any of you use the ChatGPT agent? I just can’t find a use-case that matches its (limited) capabilities.” This was echoed by Jeremy Kahn at Fortune, who noted the gap between impressive demos and real-world performance.

A quote from Jeremy Kahn at Fortune highlighting that while AI agent demos look great, they often fail at basic tasks in the real world.

Why Are AI Agents Faltering? The Core Problems

Several fundamental issues plague the current generation of AI agents. For starters, AI coding agents are creating significant technical debt by producing large amounts of difficult-to-maintain code. As one MIT professor noted, AI is like a “brand new credit card... to accumulate technical debt in ways we were never able to do before.”

Another major issue is that errors tend to compound over time. A benchmark created by Penrose.com for account balance tracking found that AI mistakes grew worse over a one-year period. This suggests that the longer an agent works on a task, the more unreliable it can become.

A chart from Penrose.com showing that AI agent errors in accounting tasks compound and increase over time.

The persistent problem of hallucinations—where the AI simply makes things up—also remains a major hurdle.

An example of an AI agent hallucinating and providing incorrect information.

An insider in the AI agent industry posted a critique on Reddit highlighting the massive gap between polished demos and real-world reliability, a problem they expect will persist for years.

A Reddit post from an AI industry insider explaining the huge gap between AI agent demos and real-world reliability.

The Security Nightmare of Unreliable Agents

Beyond just being unreliable, the superficial understanding of current agents makes them extremely vulnerable to cyberattacks. A recent multi-team effort reported by CMU PhD student Andy Zou revealed alarming security flaws.

A report summary showing that AI agents are vulnerable to prompt injection and other attacks, with even the best systems being successfully attacked over 1.45% of the time.

Even a 1.45% failure rate is devastating for security, as it means thousands of attacks were successful. For a critical system, any vulnerability is a massive risk.

The Fundamental Flaw: Mimicry Without Understanding

These flaws are not surprising when you consider how today's AI agents work. They are driven by Large Language Models (LLMs) that excel at mimicry, not genuine understanding. They can replicate the patterns of human language used to complete tasks, but they have no real concept of what it means to delete a database or fabricate a calendar entry. Their actions are a matter of statistical probability, not comprehension.

Because agentic tasks often involve multiple steps, this unreliability is magnified. With each step, there's another chance for a catastrophic error. This is a fundamental limitation that cannot be easily patched.

A Misguided Investment: Are We Pouring Money Down the Drain?

AI agents will likely become transformative one day, but it's doubtful that LLMs as they currently exist will be the foundation. A recent scoop from The Information confirms that pure scaling is hitting a point of diminishing returns; the upcoming GPT-5 is not expected to be the same massive leap over GPT-4 that GPT-4 was over GPT-3.

A report from The Information stating that OpenAI is struggling with GPT-5, which is not expected to be a major leap, and that the company is looking beyond scaling for progress.

Another recent paper argued that LLMs are confronting a wall, echoing the author's long-held concerns.

A paper arguing that large language models are confronting a cognitive 'wall' that limits their performance on reasoning tasks.

Truly reliable and trustworthy AI agents will likely require a different approach, such as neurosymbolic AI combined with rich world models. However, the industry continues to pour staggering amounts of money into the current, flawed LLM-centric path. Alternative approaches receive a tiny fraction of this investment.

Perhaps after enough high-profile failures, reality will finally set in. Until then, it's best not to trust an AI agent with your calendar, let alone your credit card.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.