OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
DALL-E 3
#35 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Qwen Image 2.0
#32 of 44 in Text-to-Image
Where the votes landed
DALL-E 3
100.0%
win rate
Ties
0.0%
Qwen Image 2.0
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
DALL-E 3
- + Excellent cinematic lighting and atmospheric depth
- + Surreal composition with consistent glowing nebula elements
- + Strong adherence to the 'cinematic' and 'detailed' descriptors
- − Failed the specific spatial instruction (astronaut is riding the horse, not horse riding the astronaut)
Qwen Image 2.0
- + High textural detail on the astronaut suit and horse mane
- + Creative scaly texture on the horse adding to the surreal theme
- + Very sharp resolution and clear foreground focus
- − Failed the specific spatial instruction (astronaut is riding the horse, not horse riding the astronaut)
- − The horse's legs have anatomical inconsistencies near the joints
Verdict: Both DALL-E 3 and Qwen Image 2.0 suffered from 'semantic bleaching' and ignored the specific negative constraint to have the horse on top of the astronaut. DALL-E 3 is visually superior for its better use of cinematic lighting and a more cohesive surreal environment, whereas Qwen Image 2.0 feels more like a standard stock image with sharper but localized details.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
DALL-E 3
- + Excellent 3D depth and complex layered composition
- + Very high artistic detail in the border, thorns, and webs
- + Moody and cinematic lighting that creates a high-end feel
- − Text rendering is mostly nonsensical and gibberish
- − The jack-o-lantern is very small in the composition
Qwen Image 2.0
- + Perfect text accuracy for all requested fields including the date and location
- + Strong adherence to all prompt elements including the scroll banner
- + Clear and legible gothic typography
- − The composition is a bit flat and less cinematic than requested
- − The jack-o-lantern and bats look slightly more generic than the artistic treatment in the other model
Verdict: While DALL-E 3 produces a much more visually stunning and atmospheric piece of art, it fails significantly on text legibility, rendering the invitation unusable. Qwen Image 2.0 followed every specific text instruction perfectly, making it the superior choice for a functional invitation despite having a slightly less complex visual style.
Explore each model
Alibaba's Qwen Image 2.0 model with enhanced text rendering, supporting both Chinese and English prompts with up to 6 images per request