ChatGPT Can Now Operate Your Computer For You
OpenAI has taken its flagship AI to the next level with the launch of ChatGPT agent, an upgraded model equipped with a virtual computer and an integrated toolkit. This enhancement allows the AI to move beyond simply answering questions and start controlling your computer to complete complex, multi-step tasks on your behalf.
This move toward greater AI autonomy comes at a time of rapid advancement in the field, with Meta's researchers observing their own AI models showing signs of independent self-improvement and OpenAI preparing for the launch of GPT-5.
Your New AI Assistant: What ChatGPT Agent Can Do
With the introduction of ChatGPT agent, users can now instruct the large language model (LLM) to not only analyze data but to act on it, as detailed in an official statement from OpenAI.
For example, instead of just finding recipes for a Japanese-style breakfast, the new agent can fully plan the meal, create a shopping list, and even purchase the ingredients online for a specific number of guests. You could also command it to scan your calendar, brief you on upcoming events, or summarize a large dataset into a concise slide deck.
Despite its impressive new abilities, the model still has limitations. Like many AIs, its spatial reasoning is weak, making it unsuitable for tasks like planning physical routes. It also lacks persistent memory, meaning it processes information in the moment without the ability to reliably recall past interactions beyond the immediate context.
Under the Hood: How the Agent Works and Performs
The agent's architecture is built on three pillars derived from previous OpenAI projects. The first is 'Operator,' an agent that uses a virtual browser to search the web. The second is 'deep research,' designed to analyze and synthesize large volumes of data. The final component is the conversational and presentation prowess of previous ChatGPT versions.
Kofi Nyarko, a professor at Morgan State University, explained, "In essence, it can autonomously browse the web, generate code, create files, and so on, all under human supervision." You can find more about his work at the DEPA Research Lab.
This new combination has led to significant performance improvements. On Humanity’s Last Exam, a benchmark for expert-level AI reasoning, the agent more than doubled the accuracy of a previous version, jumping from 20.3% to 41.6%. It also showed substantial gains in the notoriously difficult FrontierMath benchmark.
The Double-Edged Sword: Acknowledging the Risks
OpenAI has openly acknowledged the dangers that come with the agent's increased autonomy. The company stated that the model has "high biological and chemical capabilities," which could potentially be misused to assist in creating chemical or biological weapons. This creates what biosecurity experts call a “capability escalation pathway,” where an AI can instantly synthesize cross-disciplinary knowledge and even help bypass security checks.
With its ability to interact with files and websites, the agent also increases the potential for data breaches, data manipulation, and financial fraud, especially in the case of a hijacking or a sophisticated prompt injection attack.
Nyarko emphasized that the agent is not yet fully autonomous and that human supervision is crucial. "Hallucinations, user interface fragility, or misinterpretation can lead to errors," he said. "Built-in safeguards, like permission prompts and interruptibility, are essential but not sufficient to eliminate risk entirely." He also noted broader concerns for AI agents, such as amplifying biases, complicating liability, and fostering psychological dependence.
Balancing Power with Precaution: OpenAI's Safety Measures
To counter the new threats, OpenAI engineers have strengthened several safeguards. These include threat modeling, training the model to refuse harmful dual-use requests, bug bounty programs, and expert red-teaming focused on biodefense. In one notable incident, a previous 'smart' AI model from OpenAI even refused a direct order to shut down.
However, external audits suggest there is still work to be done. A risk management assessment by the non-profit SaferAI labeled OpenAI's policies as 'Weak,' with a score of just 33%. Similarly, the company received a C grade on the AI Safety Index from the Future of Life Institute, a prominent AI safety organization.