Google and OpenAI Unveil The Next Wave of AI Agents
Google Supercharges Gemini as a Powerful AI Agent
Google has rolled out a significant upgrade to its AI capabilities, releasing a duo of features that transform the Gemini CLI into a versatile AI agent. The first major announcement is Gemini 2.5 Computer Use, a specialized model designed to operate user interfaces to complete tasks. This new model empowers developers to build AI agents that can interact with browsers and mobile apps through clicks, scrolls, and text input. It demonstrates impressive performance, outperforming competitors on benchmarks like WebVoyager with a 79.9% success rate while maintaining low latency. Developers can now access the Gemini Computer Use model through the Gemini API.
Complementing this, Google’s open-source coding agent, Gemini CLI, now supports extensions. This enhancement allows Gemini CLI to connect with a wide range of third-party tools, enabling developers to personalize their workflows using custom playbooks and user-defined extensions. Google has already partnered with major companies like Figma, Shopify, and Stripe, who have developed extensions for the platform's launch.
The Rise of Humanoid Robots: Figure AI's New Generation
In the world of robotics, Figure AI has unveiled its third-generation humanoid robot, Figure03. This advanced robot boasts a 5-hour battery life, wireless charging, and enhanced sensors. It is powered by the Helix AI system, which integrates vision, language, and action capabilities. Figure AI is gearing up for mass production, with a factory already capable of producing 12,000 units annually.
Figure AI’s Figure 03 robot demonstrating its capabilities.
OpenAI's DevDay Extravaganza: ChatGPT Becomes an OS
OpenAI made a splash with its latest DevDay announcements, revealing significant updates across its product suite that position ChatGPT as a central operating system for AI.
- Apps in ChatGPT: OpenAI is transforming ChatGPT into a platform by introducing third-party “apps” and an Apps SDK. With launch partners like Spotify and Canva, this move allows for interactive and personalized software experiences directly within the chat interface, potentially making ChatGPT feel more like an OS.
- AgentKit Launch: The company launched AgentKit, a comprehensive toolkit for building and deploying agentic workflows. It includes a low-code Agent Builder, an embeddable ChatKit UI, and expanded evaluation tools.
- Codex Becomes Generally Available: OpenAI's coding assistant, Codex, is now generally available, complete with a new SDK, Slack integration, and enterprise-grade admin features.
- New Models in API: Developers can now access more powerful models, including GPT-5 Pro and Sora 2, through OpenAI's APIs.
- Real-Time Voice: A new GPT-realtime-mini speech-to-speech API was introduced, offering a fast, low-latency solution for high-quality voice agents.
New Frontiers in Generative AI
This week saw a flurry of new generative models pushing the boundaries of creativity and efficiency.
-
Grok Imagine Adds Video and Audio: xAI’s Grok Imagine now generates short videos with synchronized audio. Known for its controversial “spicy” mode, Grok Imagine can turn an image into a video without a text prompt and is less censored than competitors, allowing some NSFW content.
-
Ling-1T: Ant Group's inclusionAI lab unveiled Ling-1T, an open-weight one trillion parameter Mixture-of-Experts model that sets new benchmarks for non-thinking models in complex reasoning, math, and coding.
-
Jamba Reasoning 3B: AI21 Labs launched Jamba Reasoning 3B, a compact and efficient 3B parameter model designed for speed on local and edge devices.
-
Bagel's Paris Model: Bagel.com announced Paris, the world's first open-weight diffusion model trained in a decentralized manner by merging eight smaller pre-trained models.
-
Ovi for Open-Source Video: The Ovi project offers open-source video generation with synchronized audio, providing a locally executable alternative that is near-parity with leading proprietary systems.
-
Samsung's Tiny Recursive Model (TRM): Samsung AI presented a breakthrough in efficient reasoning with its 7M parameter TRM model. This tiny network uses recursive self-critique to achieve strong results on reasoning benchmarks, rivaling much larger models.
Big Tech's Strategic Moves and Massive Investments
The AI industry continues to see massive capital investment and strategic partnerships.
-
OpenAI and AMD Partnership: OpenAI and AMD announced a significant deal for OpenAI to deploy 6 gigawatts of AMD Instinct GPUs, giving OpenAI rights to purchase up to 10% of AMD.
-
xAI's $20 Billion Raise: Elon Musk’s xAI is reportedly raising $20 billion, partly tied to GPU supply from Nvidia, highlighting the immense capital required for frontier AI development.
-
IBM and Anthropic Team Up: IBM is partnering with Anthropic to integrate the Claude LLM into its enterprise software, aiming to boost developer productivity. Concurrently, Anthropic has appointed Rahul Patil as its new CTO.
The Global AI Landscape and Market Buzz
Globally, governments and financial institutions are closely watching the AI boom.
The EU has rolled out its “Apply AI” strategy, a €1 billion plan to foster homegrown AI and reduce reliance on the US and China. Meanwhile, talk of an AI bubble has returned, with institutions like the IMF and Bank of England warning of high valuations. Minneapolis Fed President Neel Kashkari expressed skepticism that AI is replacing workers yet, but noted that AI-related capital expenditures could push interest rates higher.
AI in the Public Eye: Creator Concerns and Controversies
As AI tools become more mainstream, their impact on the creative industry is sparking debate.
Taylor Swift fans sparked controversy after discovering that promotional videos for her new album appeared to be AI-generated. This comes after Swift previously voiced concerns about AI-driven misinformation.
Similarly, top YouTube creator MrBeast expressed worry about AI's threat to creators' livelihoods, calling the current trends “scary times” even as he has experimented with the technology himself. These events underscore the growing tension between AI's potential and its role in human creativity.