GPT Image 1.5 OpenAI Stable Diffusion 3.5 Medium Stability AI

Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.

GPT Image 1.5

26.5 arena score

#7 of 44 in Text-to-Image

Top 3 in Image Editing

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Stable Diffusion 3.5 Medium

15.7 arena score

#41 of 44 in Text-to-Image

Vote tally

Where the votes landed

GPT Image 1.5

100.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Medium

0.0%

win rate

100.0% 0.0% ties 0.0%

Shared challenges 2

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

The Reversed Rodeo

Text-to-Image

“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”

GPT Image 1.5

Stable Diffusion 3.5 Medium

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1.5

+ Excellent cinematic lighting and dynamic composition.
+ High level of texture detail on the astronaut suit and horse hair.
+ Coherent environmental details like the lunar lander and dust kicks.

− Failed the negative constraint; the astronaut is riding the horse, not vice versa.

Stable Diffusion 3.5 Medium

+ Successfully placed the astronaut on a horse in a space environment.
+ Clean, minimalist composition.

− Failed the negative constraint; the astronaut is riding the horse, not vice versa.
− Anatomical issues with the horse's legs and the astronaut's lower body.
− Lower overall resolution and detail compared to the competitor.

Verdict: Both models failed the specific spatial logic requested in the prompt ('horse on top, not vice versa'), defaulting to the standard image of a person riding a horse. GPT Image 1.5 is the superior image due to its significantly higher artistic quality, cinematic detail, and realistic textures, whereas Stable Diffusion 3.5 Medium produced anatomical distortions and a flatter aesthetic.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

GPT Image 1.5

Stable Diffusion 3.5 Medium

AI Judge Analysis

GPT Image 1.5

+ Excellent typography with perfect spelling of all requested text.
+ High-quality cinematic lighting with a coherent vintage gothic aesthetic.
+ Superior composition that integrates the border, scroll, and central subject seamlessly.

− The parchment texture is very dark, which may reduce readability for a physical card.

Stable Diffusion 3.5 Medium

+ Successfully includes the parchment, bats, and jack-o-lantern elements.
+ Uses a clear layout with distinct sections for text.

− Serious spelling errors in almost every line of text ('Invillowen', 'Timme', 'Loccation').
− Poor integration of elements, with pumpkins appearing to float or be stuck in trees awkwardly.
− Failed to create a 'central' jack-o-lantern as requested, placing two on the sides instead.

Verdict: GPT Image 1.5 significantly outperforms Stable Diffusion 3.5 Medium by following every textual instruction and the stylistic intent perfectly. While Stable Diffusion 3.5 Medium struggled with severe spelling errors and disjointed composition, GPT Image 1.5 produced a polished, professional-looking invitation that is ready for use.

Next steps

Explore each model

GPT Image 1.5

OpenAI

OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts

Vote this model in the arena

Arena profile Lumenfall catalog

Stable Diffusion 3.5 Medium

Stability AI

Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding

Vote this model in the arena

Arena profile Lumenfall catalog