Alibaba's Qwen Image 2.0 model with enhanced text rendering, supporting both Chinese and English prompts with up to 6 images per request
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
Qwen Image 2.0
#32 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Medium
#41 of 44 in Text-to-Image
Where the votes landed
Qwen Image 2.0
33.3%
win rate
Ties
33.3%
Stable Diffusion 3.5 Medium
33.3%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
Qwen Image 2.0
- + Excellent high-frequency details on the spacesuit and horse's mane
- + Cinematic lighting with a clear sense of depth and scale against Earth
- + Creative surreal touches like the scales on the horse and floating water droplets
- − Failed the specific spatial instruction for the horse to be 'on top' of the astronaut
Stable Diffusion 3.5 Medium
- + Naturalistic posing of the rider on the horse
- + Good wide-angle cinematic composition
- + Clear rendering of the Earth's atmosphere and surface below
- − Failed the specific spatial instruction for the horse to be 'on top' of the astronaut
- − Noticeable anatomical errors in the horse's legs
- − Lower texture detail compared to the competitor
Verdict: Both Qwen Image 2.0 and Stable Diffusion 3.5 Medium failed the negative/spatial constraint to have the 'horse on top' of the astronaut, instead providing the standard rider configuration. Qwen Image 2.0 is the superior image due to its much higher level of detail, beautiful rendering of the spacesuit, and creative surreal elements like the horse's scales and floating droplets.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
Qwen Image 2.0
- + Excellent typography with perfect spelling of all requested text.
- + High-quality, cinematic lighting on the central jack-o-lantern.
- + Strong adherence to the border of webs and thorns requested in the prompt.
- − The parchment texture is a bit safe and flat compared to the complex background.
Stable Diffusion 3.5 Medium
- + Dynamic composition with multiple pumpkins and a torn edges effect on the parchment.
- + Atmospheric color palette with a good contrast between the blues and oranges.
- − Numerous spelling errors including 'Halloweeen Inviloween' and 'The Aches'.
- − Failed to include the central jack-o-lantern, placing them in corners instead.
- − Did not include the requested 'scroll banner' for the secondary text.
Verdict: Qwen Image 2.0 followed the prompt instructions perfectly, rendering all text accurately and placing all compositional elements exactly where requested. Stable Diffusion 3.5 Medium struggled significantly with the text rendering and failed to include specific layout elements like the scroll banner and central jack-o-lantern.
Explore each model
Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding