Models
GPT Image 2
vs Stable Diffusion 3.5 Medium

GPT Image 2 vs Stable Diffusion 3.5 Medium

Head-to-head across 2 challenges

GPT Image 2

100.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Medium

0.0%

win rate

100.0% 0.0% ties 0.0%

Challenge Results

The Reversed Rodeo

Text-to-Image

“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”

GPT Image 2

Stable Diffusion 3.5 Medium

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 2

+ Excellent adherence to the counter-intuitive instruction of the horse riding the human
+ High-quality textures on the space suit and lunar surface
+ Clever details like the saddle designed for a human back and the horse holding reins

− The astronaut's hands/gloves have an incorrect number of fingers
− The earth in the background is slightly blurry compared to the foreground

Stable Diffusion 3.5 Medium

+ Beautiful cinematic lighting and composition
+ Dynamic pose with a sense of motion in the horse's mane

− Completely failed the negative constraint to put the horse on top
− Anatomical issues with the horse's legs and the astronaut's leg placement
− Low-resolution blurring on the earth's surface

Verdict: GPT Image 2 followed the complex prompt instruction perfectly, depicting the surreal sight of a horse riding an astronaut. Stable Diffusion 3.5 Medium fell into a common bias and placed the astronaut on top of the horse, failing the primary challenge of the prompt despite having a pleasant color palette.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

GPT Image 2

Stable Diffusion 3.5 Medium

AI Judge Analysis

GPT Image 2

+ Excellent typography with perfect spelling in all requested text fields.
+ Superior atmospheric lighting and intricate gothic detailing in the border and background.
+ Highly cohesive composition that feels like a professional invitation.

− The jack-o-lantern is central but slightly less 'glowy' than the lanterns nearby.

Stable Diffusion 3.5 Medium

+ Successfully includes the twisted trees and parchment aesthetic.
+ Distinctive jack-o-lantern expressions with high contrast.

− Multiple spelling errors in almost all text fields including 'Halloweeen' and 'Inviloween'.
− The layout is less polished and lacks the cinematic lighting requested.
− Text is poorly centered and various elements feel disconnected.

Verdict: GPT Image 2 provides a masterful execution of the prompt, delivering perfect text rendering and a rich, atmospheric gothic aesthetic. In contrast, Stable Diffusion 3.5 Medium struggles significantly with the text requirements and offers a much flatter, less professional composition. GPT Image 2's attention to detail in the border and the clever integration of NYC-themed architecture (the bridge/arches) makes it the clear winner.

GPT Image 2

OpenAI's state-of-the-art image generation model with arbitrary resolution up to 4K and strong instruction following

View Model Arena

Stable Diffusion 3.5 Medium

Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding

View Model Arena