The Max series of Tongyi Qwen’s image generation model excels across a wide range of generation tasks. Compared with the Plus series, it significantly reduces the “AI-like” feel in generated images, enhancing their realism. It delivers more lifelike material textures for human subjects, finer and more detailed natural textures, and more visually appealing text rendering.
Settled by community votes across 1 shared challenge, with an AI judge weighing in on each.
Qwen Image Max
#31 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Z-Image Turbo
#15 of 44 in Text-to-Image
Where the votes landed
Qwen Image Max
100.0%
win rate
Ties
0.0%
Z-Image Turbo
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
Qwen Image Max
- + Excellent sense of motion and 'exploded' effect as requested.
- + Highly realistic food textures on the patty and bun.
- + Vibrant and high-contrast fiery background with effective smoke.
- − The main title text is slightly thin and less 'professional' in its font choice.
- − The burger is slightly tilted in a way that feels less balanced than a standard ad.
Z-Image Turbo
- + Perfect text rendering with high-quality glowing font treatments.
- + Great implementation of the starburst for the price tag.
- + Excellent lighting on the burger patties and cheese.
- − Failed the 'exploded burger' requirement; the burger is mostly stacked rather than having suspended components.
- − The composition is a bit more static compared to the dynamic motion requested.
Verdict: Qwen Image Max successfully captured the 'exploded' motion requested in the prompt, creating a much more dynamic and energetic visual. However, Z-Image Turbo produced superior typography and a cleaner overall advertising layout, despite failing to actually separate the burger's components in mid-air.
Explore each model
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering