Qwen-Image: A Breakthrough in Native Text Rendering and Precise Image Editing
We are excited to introduce Qwen-Image, a 20-billion-parameter MMDiT-based image foundation model that delivers significant advancements in complex text rendering and precise image editing. To experience the latest capabilities, visit Qwen Chat and select the “Image Generation” feature. Key Features: Performance Qwen-Image has been rigorously evaluated across multiple public benchmarks. It achieves state-of-the-art results on general image generation tasks, including GenEval, DPG, and OneIG-Bench, as well as on image editing benchmarks such as GEdit, ImgEdit, and GSO. Notably, it excels in text rendering, outperforming existing models on LongText-Bench, ChineseWord, and TextCraft. Its exceptional performance in Chinese text generation sets it apart as a leading model for both broad visual generation and high-precision text handling. Demo One of Qwen-Image’s standout abilities is its capacity to render text with high fidelity across diverse scenarios. Chinese Text Rendering Example: A scene in the style of Studio Ghibli, captured from a flat angle, shows a bustling ancient street bathed in sunlight. A young disciple in a blue robe stands at the center, holding a card that reads “Alibaba Cloud.” Two children look at him in surprise. On the left, a shop displays a sign reading “Cloud Storage,” with glowing server racks inside and two guards at the door. On the right, one shop bears the sign “Cloud Computing,” where a woman in a qipao gazes at a shimmering computer screen. Another shop features a sign “Cloud Model,” with a large glowing wine barrel labeled “Qwen” and a shopkeeper pouring luminous code into it. The model accurately renders all text elements, maintains depth of field, and preserves character poses and expressions with remarkable realism. Another Chinese Example: A classical Chinese couplet hangs in a serene, traditional room. The left side reads “义本生知人机同道善思新,” and the right side “通云赋智乾坤启数高志远.” The horizontal scroll above says “智启通义.” The calligraphy is elegant, and the central painting depicts the Yueyang Tower. The model faithfully reproduces the text, applies authentic calligraphic style, and renders the scene with lifelike detail, including realistic blue-and-white porcelain on the table. English Text Rendering Example: A bookstore window display features a sign reading “New Arrivals This Week.” Below, a shelf tag says “Best-Selling Novels Here.” A colorful poster advertises “Author Meet And Greet on Saturday,” with a portrait of the author in the center. Four books are displayed with their titles: “The light between worlds,” “When stars are scattered,” “The silent patient,” and “The night circus.” The model accurately generates all text, including book titles and labels. Complex English Layout Example: An elegant infographic slide titled “Habits for Emotional Wellbeing” features six sections, each with a title, icon, and descriptive text. The layout is symmetrical and artistic, with floral patterns framing the content. Sections include “Practice Mindfulness” with a lotus icon, “Cultivate Gratitude” with an open hand, “Stay Connected” with a chat bubble, “Prioritize Sleep” with a crescent moon, “Regular Physical Activity” with a runner, and “Continuous Learning” with a book. The model successfully arranges all elements with clarity and visual harmony. Small Text Example: A man in a suit stands by a window, gazing at the moon. He holds a yellowed paper with handwritten text: “A lantern moon climbs through the silver night, Unfurling quiet dreams across the sky, Each star a whispered promise wrapped in light, That dawn will bloom, though darkness wanders by.” A cat sits on the windowsill. Despite the small size and long text, the model renders the passage accurately and legibly. High-Text Density Example: A Chinese woman in a “QWEN” branded T-shirt smiles at the camera, holding a black marker. Behind her, a glass panel displays a handwritten paragraph in Chinese: “一、Qwen-Image的技术路线: 探索视觉生成基础模型的极限,开创理解与生成一体化的 future。二、Qwen-Image的模型特色:1、复杂文字渲染。支持中英渲染、自动布局; 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景:赋能专业内容创作、助力生成式AI发展.” The model fully generates and renders the entire passage with precision. Bilingual Text Example: Same scene, but the glass now reads: “Meet Qwen-Image – a powerful image foundation model capable of complex text rendering and precise image editing. 欢迎了解Qwen-Image, 一款强大的图像基础模型,擅长复杂文本渲染与精准图像编辑.” The model seamlessly switches between English and Chinese, maintaining consistency and layout. Poster and PPT Creation Qwen-Image can generate professional-quality posters and presentations. For example, a movie poster titled “Imagination Unleashed” features a futuristic computer emitting radiant colors, surreal creatures, and swirling patterns. The background transitions from cosmic darkness to a luminous dreamlike expanse. The release date “Launching in the Cloud, August 2025” appears at the bottom with a glowing effect. The model also generates a high-quality Chinese PPT page with a starry blue theme, glowing tech lines, and four detailed images of plum, orchid, bamboo, and chrysanthemum, each labeled with elegant calligraphy. The layout is cohesive, visually striking, and professionally structured. Beyond Text Qwen-Image supports a wide range of artistic styles—from photorealism and impressionism to anime and minimalist design—making it a versatile tool for creators. It also enables advanced image editing, including style transfer, object addition or removal, text editing, detail enhancement, and pose adjustment, allowing users to achieve professional results with ease. In summary, Qwen-Image aims to advance image generation, lower barriers to visual content creation, and inspire new applications. We welcome community feedback to help build an open, transparent, and sustainable generative AI ecosystem.