GPT Image 1.5 vs Stable Diffusion 3.5 Large

Head-to-head across 10 challenges

GPT Image 1.5

75.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Large

25.0%

win rate

75.0% 0.0% ties 25.0%

Challenge Results

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

GPT Image 1.5
Stable Diffusion 3.5 Large
33% wins 0% ties 67% wins

AI Judge Analysis

GPT Image 1.5

  • + Perfect adherence to the spatial arrangement requested.
  • + Highly realistic glass reflections and refractions of the background plant.
  • + Excellent material textures, especially on the book's canvas cover and the wooden table.
  • The blue sphere is relatively large compared to the prompt's 'small blue sphere'.

Stable Diffusion 3.5 Large

  • + Good lighting effects and sharp focus on the glass cube.
  • + Accurate 'small' scale for the blue sphere.
  • Incorrect object placement; the book is inside the cube rather than sitting on top of it.
  • Coherence issues where the glass cube seems to clip through the red book.
  • The plant is mostly to the side/front rather than behind the cube as requested.

Verdict: GPT Image 1.5 followed the complex spatial instructions perfectly, correctly placing the book on top of the cube and the plant behind it. Stable Diffusion 3.5 Large struggled with the spatial logic, placing the book inside the cube and failing to clearly place the plant behind the glass, which was a core element of the prompt.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1.5

  • + Excellent skin and fabric textures that look realistic.
  • + Strong adherence to 'imperfect framing' with a tight, candid composition.
  • + Detailed mechanical parts on the bicycle and repair tools.
  • The car in the background lacks the requested motion blur.
  • The raindrops appear as static white dots rather than falling streaks.

Stable Diffusion 3.5 Large

  • + Successfully captured motion blur on the background vehicles.
  • + Good representation of falling rain and wet pavement reflections.
  • + Accurate adherence to 'shallow depth of field' with a soft background.
  • Anatomical issues with the man's hands and arms which appear distorted.
  • The bicycle's structure is physically impossible with floating and merging parts.
  • The man appears slightly 'pasted' into the scene with mismatched lighting.

Verdict: GPT Image 1.5 is the superior image due to its high level of photorealism, particularly in the subject's skin texture and the mechanical detail of the bike. While Stable Diffusion 3.5 Large followed the 'motion blur' prompt better, it failed significantly on structural coherence, producing mangled hands and an impossible bicycle frame.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1.5

  • + Excellent adherence to all prompt details including beads in hair and leather straps.
  • + Superior lighting effects with warm torchlight and realistic bokeh sparks.
  • + Highly detailed facial texture with convincing scars and dirt.
  • The composition is a bit tight on the forehead.

Stable Diffusion 3.5 Large

  • + Beautifully detailed ornate engraving on the plate armor.
  • + Strong character expression and clear facial features.
  • + Good interpretation of the braided hair requirement.
  • Missed the 'small beads' in the hair mentioned in the prompt.
  • The lighting feels more like daylight than the requested warm torchlight.
  • Lacks the specific bokeh sparks requested.

Verdict: GPT Image 1.5 is the clear winner as it followed every specific detail of the prompt, including the beads in the hair, the leather straps, and the specific warm torchlight atmosphere with bokeh sparks. Stable Diffusion 3.5 Large produced a high-quality image with impressive armor engraving, but it failed to include the beads and the lighting felt too cool and diffused compared to the torchlight requested.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI judge analyzing...

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI judge analyzing...

Victorian Greenhouse Oasis

Text-to-Image

“Hyper-photorealistic interior of a lush Victorian glass greenhouse filled with exotic tropical plants, vibrant blooming orchids, tall ferns, colorful butterflies in flight, sunlight filtering through ornate glass roof creating realistic caustics and dew on leaves, intricate iron framework visible, misty atmosphere, 8K masterpiece.”

GPT Image 1.5
Stable Diffusion 3.5 Large
80% wins 0% ties 20% wins

AI judge analyzing...

Heroic Super Hero Portrait

Text-to-Image

“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI judge analyzing...

Intricate Floral Mandala

Text-to-Image

“Perfectly symmetrical mandala made entirely of real flowers, petals, leaves, fruits, and seeds in vibrant natural colors, intricate layered patterns with radial symmetry, top-down view on a soft neutral background, hyper-detailed organic textures and subtle shadows, photorealistic, 8K masterpiece.”

GPT Image 1.5
Stable Diffusion 3.5 Large
60% wins 0% ties 40% wins

AI judge analyzing...

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

GPT Image 1.5
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI judge analyzing...

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

GPT Image 1.5
Stable Diffusion 3.5 Large
25% wins 0% ties 75% wins

AI judge analyzing...

GPT Image 1.5

OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts

Stable Diffusion 3.5 Large

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency