Alibaba Qwen AI Now Edits Images Like A Pro
Alibaba has just rolled out a significant update to its Qwen image model, introducing a powerful suite of editing tools capable of both minor visual tweaks and major semantic transformations.
A Leap Forward in AI Image Editing
The new model, named Qwen-Image-Edit, is built upon the robust foundation of Alibaba's 20-billion-parameter Qwen-Image model. It employs a sophisticated dual-processing strategy. For semantic understanding and control, it leverages Qwen2.5-VL, while a Variational Autoencoder (VAE) is tasked with managing the visual appearance and fidelity of the image. This combination allows the system to handle a wide spectrum of edits, from simple touch-ups to complex conceptual changes.
Two Modes for Creative Control
Qwen-Image-Edit offers two distinct workflows to suit different creative needs:
- Appearance Editing: This mode allows users to make precise changes to specific areas of an image while ensuring the rest of the composition remains completely untouched. It's ideal for tasks like removing stray hairs, editing clothing, or changing background elements.
- Semantic Editing: This powerful mode modifies pixels across the entire image to implement a new concept, such as changing the style or rotating an object, while maintaining the core identity and consistency of the main subject.
From Mascot Creation to Style Transfer
To showcase its semantic editing prowess, Alibaba demonstrated how the model can generate new intellectual property (IP) content featuring its Capybara mascot. Even with significant pixel changes across the image, the character remains instantly recognizable in various new roles and styles.
Qwen Image Edit generates new versions of the Capybara mascot that can be used as stickers in messenger apps and other formats. | Image: Alibaba
Other creative applications include generating new perspectives with 90 or 180-degree object rotations and performing style transfers, such as transforming a standard portrait into a Studio Ghibli-inspired avatar.
The model generates new viewpoints for people, animals, and objects. | Image: Alibaba
Intelligent Object and Background Manipulation
The model's capabilities extend to complex interactions within an image. It can seamlessly add new objects, like a wooden sign in front of a penguin colony, and realistically render corresponding shadows and reflections. This demonstrates a sophisticated understanding of light and environmental context.
Qwen Image Edit places a wooden sign reading "Welcome to Penguin Beach" in front of a penguin colony and generates natural shadows. | Image: Alibaba
Advanced Bilingual Text Editing
One of the standout features of Qwen Image Edit is its exceptional ability to edit text in both Chinese and English directly within images. The system can add, remove, or modify text while perfectly preserving the original font, size, and style, as seen in an example where Scrabble tiles are changed from "Health Insurance" to "Financial Planning."
Qwen Image Edit updates Scrabble tiles from "Health Insurance" to "Financial Planning," maintaining the original look. | Image: Alibaba
For corrections, users can simply draw bounding boxes around incorrect or unwanted text, and the model updates the selected areas. While it can occasionally be challenged by rare characters, the system supports a step-by-step refinement process, allowing users to mark specific spots for further edits until the result is perfect.
The tool replaces incorrect characters and lets users directly mark the areas that need changes. | Image: Alibaba
Availability and Industry Context
Alibaba claims that Qwen Image Edit achieves state-of-the-art performance on public image editing benchmarks. The model is now accessible through the "Image Editing" feature in Qwen Chat and is also available for developers on Github, Hugging Face, and Modelscope.
This release marks a significant advancement in the field of targeted image editing, an area where AI models have historically struggled. It demonstrates how quickly the technology is moving beyond simple generation to provide nuanced and precise creative control.