Qwen Image 2.0 Alibaba Stable Diffusion 3.5 Medium Stability AI

Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.

Qwen Image 2.0

19.8 arena score

#32 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Stable Diffusion 3.5 Medium

15.7 arena score

#41 of 44 in Text-to-Image

Vote tally

Where the votes landed

Qwen Image 2.0

33.3%

win rate

Ties

33.3%

Stable Diffusion 3.5 Medium

33.3%

win rate

33.3% 33.3% ties 33.3%

Shared challenges 2

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

The Reversed Rodeo

Text-to-Image

“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”

Qwen Image 2.0

Stable Diffusion 3.5 Medium

33% wins 33% ties 33% wins

AI Judge Analysis

Qwen Image 2.0

+ Excellent high-frequency details on the spacesuit and horse's mane
+ Cinematic lighting with a clear sense of depth and scale against Earth
+ Creative surreal touches like the scales on the horse and floating water droplets

− Failed the specific spatial instruction for the horse to be 'on top' of the astronaut

Stable Diffusion 3.5 Medium

+ Naturalistic posing of the rider on the horse
+ Good wide-angle cinematic composition
+ Clear rendering of the Earth's atmosphere and surface below

− Failed the specific spatial instruction for the horse to be 'on top' of the astronaut
− Noticeable anatomical errors in the horse's legs
− Lower texture detail compared to the competitor

Verdict: Both Qwen Image 2.0 and Stable Diffusion 3.5 Medium failed the negative/spatial constraint to have the 'horse on top' of the astronaut, instead providing the standard rider configuration. Qwen Image 2.0 is the superior image due to its much higher level of detail, beautiful rendering of the spacesuit, and creative surreal elements like the horse's scales and floating droplets.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

Qwen Image 2.0

Stable Diffusion 3.5 Medium

AI Judge Analysis

Qwen Image 2.0

+ Excellent typography with perfect spelling of all requested text.
+ High-quality, cinematic lighting on the central jack-o-lantern.
+ Strong adherence to the border of webs and thorns requested in the prompt.

− The parchment texture is a bit safe and flat compared to the complex background.

Stable Diffusion 3.5 Medium

+ Dynamic composition with multiple pumpkins and a torn edges effect on the parchment.
+ Atmospheric color palette with a good contrast between the blues and oranges.

− Numerous spelling errors including 'Halloweeen Inviloween' and 'The Aches'.
− Failed to include the central jack-o-lantern, placing them in corners instead.
− Did not include the requested 'scroll banner' for the secondary text.

Verdict: Qwen Image 2.0 followed the prompt instructions perfectly, rendering all text accurately and placing all compositional elements exactly where requested. Stable Diffusion 3.5 Medium struggled significantly with the text rendering and failed to include specific layout elements like the scroll banner and central jack-o-lantern.

Next steps

Explore each model

Qwen Image 2.0

Alibaba

Alibaba's Qwen Image 2.0 model with enhanced text rendering, supporting both Chinese and English prompts with up to 6 images per request

Vote this model in the arena

Arena profile Lumenfall catalog

Stable Diffusion 3.5 Medium

Stability AI

Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding

Vote this model in the arena

Arena profile Lumenfall catalog