GPT Image 2 vs Stable Diffusion 3.5 Medium

Head-to-head across 2 challenges

GPT Image 2

100.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Medium

0.0%

win rate

100.0% 0.0% ties 0.0%

Challenge Results

The Reversed Rodeo

Text-to-Image

“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”

GPT Image 2
Stable Diffusion 3.5 Medium
100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 2

  • + Excellent adherence to the counter-intuitive instruction of the horse riding the human
  • + High-quality textures on the space suit and lunar surface
  • + Clever details like the saddle designed for a human back and the horse holding reins
  • The astronaut's hands/gloves have an incorrect number of fingers
  • The earth in the background is slightly blurry compared to the foreground

Stable Diffusion 3.5 Medium

  • + Beautiful cinematic lighting and composition
  • + Dynamic pose with a sense of motion in the horse's mane
  • Completely failed the negative constraint to put the horse on top
  • Anatomical issues with the horse's legs and the astronaut's leg placement
  • Low-resolution blurring on the earth's surface

Verdict: GPT Image 2 followed the complex prompt instruction perfectly, depicting the surreal sight of a horse riding an astronaut. Stable Diffusion 3.5 Medium fell into a common bias and placed the astronaut on top of the horse, failing the primary challenge of the prompt despite having a pleasant color palette.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

GPT Image 2
Stable Diffusion 3.5 Medium

AI Judge Analysis

GPT Image 2

  • + Excellent typography with perfect spelling in all requested text fields.
  • + Superior atmospheric lighting and intricate gothic detailing in the border and background.
  • + Highly cohesive composition that feels like a professional invitation.
  • The jack-o-lantern is central but slightly less 'glowy' than the lanterns nearby.

Stable Diffusion 3.5 Medium

  • + Successfully includes the twisted trees and parchment aesthetic.
  • + Distinctive jack-o-lantern expressions with high contrast.
  • Multiple spelling errors in almost all text fields including 'Halloweeen' and 'Inviloween'.
  • The layout is less polished and lacks the cinematic lighting requested.
  • Text is poorly centered and various elements feel disconnected.

Verdict: GPT Image 2 provides a masterful execution of the prompt, delivering perfect text rendering and a rich, atmospheric gothic aesthetic. In contrast, Stable Diffusion 3.5 Medium struggles significantly with the text requirements and offers a much flatter, less professional composition. GPT Image 2's attention to detail in the border and the clever integration of NYC-themed architecture (the bridge/arches) makes it the clear winner.

GPT Image 2

OpenAI's state-of-the-art image generation model with arbitrary resolution up to 4K and strong instruction following

Stable Diffusion 3.5 Medium

Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding