Azure Supercharges AI With New Multimodal Tools

2025-10-07•Steve Sweetman, Naomi Moneypenny•5 minutes read

Azure

OpenAI

Artificial Intelligence

Imagine a single platform where any developer, from a startup founder to an enterprise engineer, can access the complete range of AI capabilities including text, images, audio, and video. At this year's OpenAI DevDay, Azure AI Foundry is turning that vision into a reality. With the launch of OpenAI GPT-image-1-mini, GPT-realtime-mini, and GPT-audio-mini, alongside significant safety enhancements for GPT-5, developers now have the ultimate toolkit to build, test, and scale multimodal solutions more quickly and affordably than ever before. We are thrilled to announce that the models revealed by OpenAI will be available in Azure AI Foundry, with most customers gaining access starting October 7, 2025.

This announcement follows other major innovations, including the preview release of the Microsoft Agent Framework, multi-agent workflows in private preview, unified observability, and new Responsible AI features. The Microsoft Agent Framework, an open-source SDK available on GitHub, combines the enterprise-ready foundations of Semantic Kernel with the multi-agent power of AutoGen, enabling developers to create intelligent and scalable agentic solutions with confidence.

By integrating the latest OpenAI models and advancing our agentic frameworks, we provide customers with unmatched choice and flexibility, empowering them to build intelligent systems that solve complex business challenges and drive innovation.

Meet the New AI Models

GPT-image-1-mini: Compact Power for Visual Creativity

GPT-image-1-mini is designed for developers who need fast and efficient image generation at scale. Its compact design allows for high-quality text-to-image and image-to-image creation while using fewer resources, making multimodal AI accessible even in resource-constrained environments. Built on the Image-1 model, it ensures consistency and easy adoption for those already using multimodal AI in Azure AI Foundry.

What makes it special?

Flexible Image Generation: Create high-quality text-to-image and image-to-image content affordably.
Lightning-Fast Inference: Generate images in real-time, integrating smoothly with existing workflows.

Use Cases:

Creating educational materials for online learning.
Designing illustrations for storybooks and visual narratives.
Producing game assets for rapid prototyping.
Speeding up UI design for applications and websites.

Table 1: GPT-image-1-mini pricing and deployment in Azure AI Foundry (per 1m tokens)*

Table with pricing information.

GPT-realtime-mini and GPT-audio-mini: Efficient and Affordable Voice Solutions

These two new mini models deliver fast, cost-effective multimodal AI without compromising on quality. They are lightweight and highly optimized for real-time voice interaction and audio generation with minimal latency, making them ideal for applications where speed is crucial. By consuming fewer resources, these models help businesses reduce operational costs while scaling their multimodal capabilities.

What makes them special?

Real-Time Responsiveness: Power chatbots, assistants, and translation tools with near-zero latency.
Resource-Light: Run advanced voice and audio models on minimal infrastructure.
Affordable Scaling: Lower operational costs while expanding multimodal features.

Use Cases:

Voice-based chatbots for customer support.
Real-time translation for global communication.
Dynamic audio content creation for media.
Interactive voice assistants for consumer and enterprise apps.

GPT‑realtime‑mini in Azure AI Foundry enables our customer to build voice solutions with lower latency, better instruction adherence, and cost efficiency—capabilities our customers value, driving shorter handle times, smoother dialogues, and faster time‑to‑value.

— Andy O’Dower, VP of Product, Twilio

Table 2: GPT-realtime-mini and GPT-audio-mini pricing and deployment in Azure AI Foundry (per 1m tokens)*

Table with pricing information.

GPT-5-chat-latest: Raising the Bar for Safety and Wellbeing

The latest update to GPT-5-chat-latest introduces more robust safety guardrails to better protect users during sensitive conversations. With enhanced detection and response capabilities, the model can more effectively manage dialogue that could lead to emotional distress. These improvements reflect our commitment to responsible AI, ensuring every interaction is not only helpful but also safe and supportive.

Table 3: GPT-5-chat-latest pricing and deployment in Azure AI Foundry (per 1m tokens)*

Table with pricing information.

GPT-5-pro: The Pinnacle of Reasoning and Analytics

GPT-5-pro represents the peak of advanced reasoning and analytics in the Azure AI Foundry ecosystem. Its tournament-style architecture uses multiple reasoning pathways to ensure maximum accuracy, making it ideal for complex analytics, code generation, and decision-making. With Azure AI Foundry, organizations can unlock the full potential of GPT-5-pro to drive smarter decisions and accelerate innovation securely.

Table 4: GPT-5-pro pricing and deployment in Azure AI Foundry (per 1m tokens)*

Table with pricing information.

Empowering Developers to Build and Ship Faster

With these new models, Azure AI Foundry is setting the pace for AI innovation. Developers can now move beyond text to harness image and audio generation, creating richer and smarter workflows that drive progress in every industry, from education and gaming to enterprise automation.

A Sneak Peek at What's Next: Sora 2

More advancements are on the horizon. Sora 2 is coming soon to Azure AI Foundry, offering advanced video and audio generation through a single API. Imagine creating physics-driven animations, synchronized dialogue, and cameo features—all accessible to developers. Stay tuned for the next generation of immersive, generative experiences.

Are you ready to build the next wave of multimodal AI? Azure AI Foundry is your platform for every possibility.

*Pricing is accurate as of October 2025.

Read Original Post