DALL-E 3 OpenAI Stable Diffusion 3.5 Large Stability AI

Settled by community votes across 8 shared challenges, with an AI judge weighing in on each.

DALL-E 3

18.5 arena score

#35 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Stable Diffusion 3.5 Large

22.9 arena score

#25 of 44 in Text-to-Image

Vote tally

Where the votes landed

DALL-E 3

0.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Large

100.0%

win rate

0.0% 0.0% ties 100.0%

Shared challenges 8

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent high-detail rendering of wood and glass textures
+ Strong atmospheric lighting and cinematic composition
+ Creative interpretation of the blue sphere with an internal forest-like landscape

− Failed the spatial logical requirement of the red book on top of the cube, placing it inside instead
− The plant is behind the glass but not visible through it as a distinct shape
− Added a wooden frame not mentioned in the prompt

Stable Diffusion 3.5 Large

+ Perfect spatial adherence for all objects including the book on top and sphere inside
+ Accurately represents the transparency effect of the plant through the glass
+ Realistic lighting that matches the requested window light direction

− The red book is clipped by the glass cube edges in a physically impossible way
− The surface of the sphere appears slightly flat and lacks the high-end finish of Model A
− Composition is a bit cluttered with the background furniture

Verdict: Stable Diffusion 3.5 Large followed the complex spatial instructions perfectly, placing the sphere inside and the book correctly on top, though it suffered from minor clipping issues with the glass. DALL-E 3 produced a much more visually stunning and high-quality image but failed almost every spatial relationship requested, placing the book inside the cube and adding an unrequested wooden frame. Stable Diffusion 3.5 Large is the winner for its superior prompt adherence to specific object placement.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent environmental storytelling with the shop lanterns and wet street textures
+ Stronger adherence to the cinematic lighting and 'imperfect framing' request
+ High-quality reflections on the pavement

− Visible anatomy issues with the man's neck and bare feet
− The out-of-focus foreground elements are a bit distracting and artificial

Stable Diffusion 3.5 Large

+ Natural skin texture is very realistic on the subject's arms and face
+ Consistent raining effect and visible water splashes
+ Cleaner anatomy and more realistic proportions for the man

− Fails the 'motion blur' request as cars in the background appear sharp
− The bicycle has structural inconsistencies, specifically where the frame meets the rear wheel

Verdict: DALL-E 3 captures the 'cinematic' and 'imperfect framing' aspect of the prompt more effectively, creating a moodier atmosphere typical of street photography. However, Stable Diffusion 3.5 Large provides much more realistic skin textures and a more grounded human figure, even though it misses the motion blur requirement. Stable Diffusion 3.5 Large is the winner for its superior visual realism and lack of the distracting anatomical glitches found in DALL-E 3.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent use of warm torchlight and glowing bokeh effects.
+ Highly detailed engraving on the armor which looks weathered and battle-worn.
+ Intricate beads and jewelry integrated into the hair and braids.

− The facial features look a bit overly airbrushed/smooth compared to the rugged theme.
− The framing is very tight, clipping parts of the character's head.

Stable Diffusion 3.5 Large

+ Provides a more realistic, lifelike skin texture with visible pores and natural dirt.
+ Strong adherence to the braid requirement with clearly defined patterns.
+ Excellent rendering of the cloth and chainmail underlayers.

− The 'bokeh sparks' are much less prominent than in Image A.
− The 'warm torchlight' feels a bit flat and less like a primary light source.

Verdict: DALL-E 3 produces a more cinematic and artistically striking image with vibrant lighting and ornate armor textures that feel very 'fantasy hero'. However, Stable Diffusion 3.5 Large offers more realistic skin textures and a better execution of the braided hair and underlayer details. Stable Diffusion 3.5 Large is the preferred choice for a 'lifelike' portrait, whereas DALL-E 3 excels at the magical atmosphere.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent variety of layout options presented as a grid of mockups
+ High-quality, vibrant food photography that looks appetizing
+ Included all requested menu categories (Appetizers, Pizza, Mains)

− Font choice is stylized/distorted rather than clean sans-serif
− Text legibility is very poor with significant gibberish

Stable Diffusion 3.5 Large

+ Successfully used bold sans-serif fonts as requested
+ Center-column layout follows modern minimalist design principles well
+ Text is significantly more legible and structured

− The grid of photos is placed on the margins rather than being the primary background layout
− Spelling errors in headers (e.g., 'MAIMAES', 'APPETIZRS')

Verdict: Stable Diffusion 3.5 Large followed the typography and stylistic instructions much better, producing a clean, modern layout that looks like a real menu. While DALL-E 3 produced more vibrant food photography, its text rendering and font choices were messy and failed the 'bold sans-serif' requirement.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent 3D cartoon aesthetic with soft, pillowy textures.
+ Perfect adherence to the isometric perspective and square composition.
+ Very clean and professional lighting and shading.

− Failed to place the text 'JAPAN SUSHI' at the top-center.
− Simplified the rice into unrealistic round beads.

Stable Diffusion 3.5 Large

+ Incorporated all requested text 'JAPAN SUSHI' and a flag icon.
+ Higher variety of sushi types and more realistic textures for food items.
+ Strong adherence to the 'small raised diorama base' requirement.

− The perspective is not quite 45-degree isometric, appearing more like a standard 3D render angle.
− The background has a slightly grainy texture rather than being a 'solid' light blue.
− Some visual artifacts present in the chopsticks and small flag details.

Verdict: DALL-E 3 followed the stylistic 'cartoon 3D' and 'isometric' instructions much better, creating an ultra-clean image, though it failed to correctly place the specific text requested at the top. Stable Diffusion 3.5 Large captured all functional prompt elements including the specific text and flag icon, but the overall composition feels less cohesive as a 'miniature diorama' compared to the polished look of DALL-E 3. DALL-E 3 is the preferred choice for its superior visual quality and adherence to the isometric style.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

DALL-E 3

Stable Diffusion 3.5 Large

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 3

+ Excellent adherence to lighting requests like 'god rays' and 'dew sparkles'
+ Captures all four requested animals with clear distinction
+ Extremely vibrant colors and clean composition

− Very stylized and illustrative rather than 'hyper-photorealistic'
− Surreal 'butterfly-bird' hybrids that look like artifacts
− Anatomy is very cartoonish/cutesy, lacking natural realism

Stable Diffusion 3.5 Large

+ Much closer to the requested 'hyper-photorealistic' style
+ Dynamic posing that captures the 'playfully chasing' and 'tumbling' aspect of the prompt
+ Beautifully soft bokeh and naturalistic lighting

− The fox kit in the background is slightly blurry and lacks fine detail
− Missing the specific 'tabby' pattern on the kitten, appearing more solid brown
− Less emphasized 'god rays' compared to Model A

Verdict: While DALL-E 3 produced a very charming and well-composed image, it failed the style requirement by producing a 3D digital illustration rather than a photograph. Stable Diffusion 3.5 Large followed the 'photorealistic' instruction much more effectively, providing a natural-looking scene with dynamic movement and realistic fur textures.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent vector emblem aesthetic with a professional polish.
+ Superior vintage texture and ornamental details.
+ High-quality typography and balanced circular composition.

− Failed to include the specific brand name 'Caffè Florian', replacing it with generic text.
− Logo is quite complex, bordering on illustrative rather than minimalist.

Stable Diffusion 3.5 Large

+ Accurately included the requested brand name 'Caffè Florian' (with a minor double accent).
+ Captures a more minimalist interpretation of the cloche and banner.
+ Strong adherence to the light background and warm brown palette.

− The center icon (suspended cloche) looks slightly awkward and disjointed.
− Texture on the background feels a bit generic and washed out.

Verdict: DALL-E 3 produced a far superior visual design that perfectly captures the prestige of a vintage emblem, but it completely ignored the specific brand name requested. Stable Diffusion 3.5 Large correctly followed the text prompt for the name 'Caffè Florian' and provided a more minimalist layout, though the actual icon design is less refined than its competitor.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

DALL-E 3

Stable Diffusion 3.5 Large

AI Judge Analysis

DALL-E 3

+ Excellent adherence to the requested color palette and minimalist vector style.
+ Provides three distinct layout options that clearly mimic the vertical poster format.
+ Successfully incorporates sequential numbering and distinct, recognizable icons for different mission phases.

− Includes space shuttle-like icons which are historically inaccurate for the Apollo 11 Saturn V mission.
− The infographic content is mostly filler text and repetition rather than a strictly logical 6-step flow.

Stable Diffusion 3.5 Large

+ Features a more cinematic composition with a large, detailed lunar surface.
+ Accurately represents the Earth, Moon, and celestial bodies with decent shading.
+ Correctly identifies names like 'Collins' even if letters are slightly distorted.

− Fails to follow the 'infographic poster' layout, appearing more like a singular illustration with random text.
− Does not clearly visualize the 6 requested steps in a sequential, easy-to-read manner.
− Includes a space shuttle and several non-mission related planets (like a ringed planet) which contradict the prompt.

Verdict: DALL-E 3 (Image A) is the clear winner as it successfully interprets the 'infographic poster' requirement through structured layouts, intentional iconography, and a consistent vector style. While Stable Diffusion 3.5 Large (Image B) has high visual fidelity in its moons, it fails the basic task of creating a sequential 6-step infographic and includes irrelevant elements like the Space Shuttle and Saturn-like planets.

Next steps

Explore each model

DALL-E 3

OpenAI

OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions

Vote this model in the arena

Arena profile Lumenfall catalog

Stable Diffusion 3.5 Large

Stability AI

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency

Vote this model in the arena

Arena profile Lumenfall catalog