OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
DALL-E 3
#35 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Medium
#41 of 44 in Text-to-Image
Where the votes landed
DALL-E 3
0%
win rate
Ties
0%
Stable Diffusion 3.5 Medium
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
DALL-E 3
- + Excellent cinematic lighting and atmospheric effects
- + High visual coherence with the celestial background
- + Clearer representation of the 'space' environment using nebulae and clouds
- − Failed the negative constraint; the astronaut is riding the horse instead of the horse on top
Stable Diffusion 3.5 Medium
- + Realistic texture on the astronaut suit and horse hair
- + Good composition with the planet curvature in the background
- − Failed the central negative constraint; horse is on bottom
- − Visible anatomy errors including five legs on the horse
- − The astronaut appears to have a third leg or extra appendage dangling from the saddle
Verdict: Both DALL-E 3 and Stable Diffusion 3.5 Medium failed to follow the specific spatial instruction for the horse to be on top of the astronaut. However, DALL-E 3 produced a significantly higher quality image with better lighting and fewer anatomical errors, whereas Stable Diffusion 3.5 Medium suffered from severe physical artifacts including extra limbs on both the horse and rider.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
DALL-E 3
- + Exquisite framing with 3D-like depth and intricate gothic textures.
- + Superior cinematic lighting and atmosphere that perfectly matches the 'vintage gothic' prompt.
- + The main title 'HALLOWEEN INVITATION' is rendered with high stylistic consistency.
- − The smaller secondary text and event details are illegible or garbled.
- − The date and time are scattered and incorrect compared to the prompt requirements.
Stable Diffusion 3.5 Medium
- + Text is much closer to being legible, including specific words like 'The Aches' and 'NYC'.
- + Clear inclusion of all requested elements like twisted trees, pumpkins, and parchment in a readable layout.
- − Significant spelling errors in the main title ('Halloweeen Inviloween').
- − Simple, almost clip-art style compared to the sophisticated cinematic aesthetic requested.
Verdict: DALL-E 3 creates a stunningly moody and high-quality artistic piece that captures the gothic aesthetic perfectly, though it fails on the specific text details. Stable Diffusion 3.5 Medium attempts more of the literal text but fails with numerous spelling errors and a much flatter, less 'cinematic' visual style. DALL-E 3 is the preferred choice for a card base due to its superior composition and atmospheric quality.
Explore each model
Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding