Testing ChatGPTs New Agent A Hands On Review
The age of AI agents has arrived. ChatGPT is evolving beyond simply answering questions with text synthesized from data scraped across the web. Now, it can connect with your applications to perform real-world tasks for you, such as booking tickets, checking your calendar, or creating slideshows.
This powerful new feature is called ChatGPT Agent. It effectively gives the AI a virtual computer to work on within your chat. OpenAI describes it as a tool that can "fluidly shift between reasoning and action to handle complex workflows from start to finish." For those on a paid ChatGPT plan (starting at $20/month), this feature is available to try now. I decided to test it on a few projects to see how it holds up.
Getting Started With ChatGPT Agent Mode
Activating Agent mode is straightforward. In the web app, you click the + (plus) button next to the prompt box and select Agent mode. You're then asked to describe the task you want the AI to perform. There are no strict rules for your prompt, but on-screen suggestions range from summarizing the news to ordering groceries.
How Agent Mode Works Under the Hood
Once you provide a task, the Agent may ask clarifying questions. The interface looks similar to a normal chat, but with a key difference: an embedded window that shows you what ChatGPT is doing on its virtual computer.
This isn't a live video feed, but rather a graphical representation of the AI's actions. You can take control at any moment to see exactly what's happening, switch to a text-only activity feed, or stop the agent entirely if it goes off track.
When the task is complete, ChatGPT Agent provides a summary report and a list of sources. However, the process can be slow, much like its Deep Research tool. This means you'll likely set it and forget it, which requires a significant amount of trust in the AI's ability to handle the job without constant supervision.
A Hands-On Test The Birthday Party and the Spreadsheet
For my first test, I asked the Agent to plan a small, low-key birthday party. I provided my age, party preferences, desired venue type, and potential dates, and also requested it to create some invitations. The bot did a great job, identifying the same local venues I would have chosen myself. It struggled with some details, like failing to open PDFs to get booking information, but the final report included a helpful comparison chart and contact details.
Next, I tasked it with creating a spreadsheet of all iPhone launch dates—a genuine time-saver for my work. The Agent impressively identified reliable sources like Wikipedia, Apple's press releases, and MacRumors. The final spreadsheet was accurate, although it lacked the formatting I requested and the sources column was not well-organized. It also took a considerable amount of time, arguably as long as it would have taken me to do it manually, though my time was free for other things.
The Verdict Is ChatGPT Agent Worth It
I'm impressed with how slick and capable ChatGPT Agent is. While it wasn't perfect, it mostly took the right steps and demonstrated a good level of transparency. You can always monitor its progress and intervene when needed.
However, I'm personally hesitant to rely on it for critical tasks. I worry too much about the AI making a mistake, missing a key detail, or misunderstanding a nuance—a common issue with models that can sometimes provide wildly incorrect responses. Your tolerance for these risks may differ, and I expect many will find the time-saving benefits well worth overlooking minor flaws.
Disclosure: Lifehacker’s parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.