The Broken Promises of AI Personal Assistants
The Grand Promise of AI Agents
While chatbots are designed to answer your questions, AI agents were promised as the next leap forward—proactive assistants meant to manage your life and business. Imagine an AI that could shop for you, book travel, organize your calendar, and even maintain complex software systems. This was the future that, by the end of 2024, tech giants like Google, OpenAI, and Anthropic declared was just around the corner.
The industry's leaders set high expectations. Anthropic's Dario Amodei told the Financial Times that by 2025, AI would be capable of tasks typically handled by a PhD student. OpenAI's CEO Sam Altman echoed this sentiment in a blog post, predicting that AI agents would “join the workforce” in 2025. Google also announced its own Project Astra, aiming to build a universal AI assistant.
Tech commentators were quick to amplify the excitement.
But as we move through the year, a different question emerges: do they actually work? Skeptics, including the author of the original piece, predicted that AI agents would be hyped throughout 2025 but remain far from reliable except in very specific scenarios. With several months left, let's look at where things stand.
Hype vs. Reality: A Mid-Year Report Card
The big companies have indeed launched their agents. OpenAI released its ChatGPT agent, and Google's Astra is available to a waitlist of testers. The problem is that none of them appear to be reliable for mainstream use.
OpenAI admits as much, stating that while its agent can handle complex tasks, it “can still make mistakes” and “introduces new risks” because it works directly with user data. In practice, these errors have severely limited their usefulness, and the boundary-pushing future we were promised feels largely absent.
By March, the cracks were already showing. A Fortune story reported that “some AI agent customers say reality doesn’t match the hype,” a sentiment that has only intensified.
Recent reports paint a picture of widespread disappointment.
Even AI enthusiasts are struggling to find value. One popular influencer recently posted on X, asking, “serious question: do any of you use the ChatGPT agent? I just can’t find a use-case that matches its (limited) capabilities.” This was echoed by Jeremy Kahn at Fortune, who noted the gap between impressive demos and real-world performance.
Why Are AI Agents Faltering? The Core Problems
Several fundamental issues plague the current generation of AI agents. For starters, AI coding agents are creating significant technical debt by producing large amounts of difficult-to-maintain code. As one MIT professor noted, AI is like a “brand new credit card... to accumulate technical debt in ways we were never able to do before.”
Another major issue is that errors tend to compound over time. A benchmark created by Penrose.com for account balance tracking found that AI mistakes grew worse over a one-year period. This suggests that the longer an agent works on a task, the more unreliable it can become.
The persistent problem of hallucinations—where the AI simply makes things up—also remains a major hurdle.
An insider in the AI agent industry posted a critique on Reddit highlighting the massive gap between polished demos and real-world reliability, a problem they expect will persist for years.
The Security Nightmare of Unreliable Agents
Beyond just being unreliable, the superficial understanding of current agents makes them extremely vulnerable to cyberattacks. A recent multi-team effort reported by CMU PhD student Andy Zou revealed alarming security flaws.
Even a 1.45% failure rate is devastating for security, as it means thousands of attacks were successful. For a critical system, any vulnerability is a massive risk.
The Fundamental Flaw: Mimicry Without Understanding
These flaws are not surprising when you consider how today's AI agents work. They are driven by Large Language Models (LLMs) that excel at mimicry, not genuine understanding. They can replicate the patterns of human language used to complete tasks, but they have no real concept of what it means to delete a database or fabricate a calendar entry. Their actions are a matter of statistical probability, not comprehension.
Because agentic tasks often involve multiple steps, this unreliability is magnified. With each step, there's another chance for a catastrophic error. This is a fundamental limitation that cannot be easily patched.
A Misguided Investment: Are We Pouring Money Down the Drain?
AI agents will likely become transformative one day, but it's doubtful that LLMs as they currently exist will be the foundation. A recent scoop from The Information confirms that pure scaling is hitting a point of diminishing returns; the upcoming GPT-5 is not expected to be the same massive leap over GPT-4 that GPT-4 was over GPT-3.
Another recent paper argued that LLMs are confronting a wall, echoing the author's long-held concerns.
Truly reliable and trustworthy AI agents will likely require a different approach, such as neurosymbolic AI combined with rich world models. However, the industry continues to pour staggering amounts of money into the current, flawed LLM-centric path. Alternative approaches receive a tiny fraction of this investment.
Perhaps after enough high-profile failures, reality will finally set in. Until then, it's best not to trust an AI agent with your calendar, let alone your credit card.