GPT Image 1 Mini vs Stable Diffusion 3.5 Large

Head-to-head across 4 challenges

GPT Image 1 Mini

75.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Large

25.0%

win rate

75.0% 0.0% ties 25.0%

Challenge Results

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

GPT Image 1 Mini

Stable Diffusion 3.5 Large

AI Judge Analysis

GPT Image 1 Mini

+ Perfect adherence to the spatial requirements of the prompt.
+ Higher photographic realism with soft, natural lighting.
+ Clean composition with a clear view of the plant through the glass.

− The blue sphere appears slightly larger than a 'small' sphere.
− The book is floating slightly above the glass rim rather than resting flat.

Stable Diffusion 3.5 Large

+ High clarity and sharp details on the wooden surface and glass edges.
+ Accurate interpretation of the plant being behind the cube.

− Failed to place the red book on top of the cube, placing it underneath instead.
− The lighting is harsh and direct rather than the requested 'soft window light'.

Verdict: GPT Image 1 Mini followed all spatial instructions, correctly placing the book on top of the cube and the sphere inside. Stable Diffusion 3.5 Large failed the primary layout task by placing the book under the sphere and cube, although it produced a very high-resolution image with sharp textures.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

GPT Image 1 Mini

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent shallow depth of field and bokeh
+ Highly realistic skin texture and facial lighting
+ Strong cinematic atmosphere with natural, muted colors

− The white car in the background lacks the requested motion blur
− Anatomical issues with how the man's hands are interacting with the rear wheel spokes

Stable Diffusion 3.5 Large

+ Better adherence to the motion blur request for passing vehicles
+ Captures the scale of a Japanese street with the bus and signage
+ Vibrant colors and convincing wet pavement reflections

− The 'rain' looks like static vertical lines rather than realistic droplets
− The bicycle geometry is broken (seat post missing, frame alignment)
− Overall image has a slightly AI-processed 'sheen' that ignores the 'no stylization' request

Verdict: GPT Image 1 Mini produces a much more convincing and high-quality portrait with superior skin textures and photographic depth, though it missed the specific request for motion blur. Stable Diffusion 3.5 Large followed more of the prompt instructions regarding the background elements, but failed on technical execution with a poorly rendered bicycle and unrealistic rain effects. GPT Image 1 Mini is the preferred choice for its realism and believable cinematic quality.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

GPT Image 1 Mini

Stable Diffusion 3.5 Large

67% wins 0% ties 33% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent depiction of warm torchlight reflecting off the metal surfaces.
+ Highly detailed skin texture with convincing dirt and aging.
+ The ornate engraving on the plate armor is complex and aesthetically pleasing.

− Missed the request for small beads in the braided hair.
− The armor engraving lacks some of the physical depth/relief found in the competitor.

Stable Diffusion 3.5 Large

+ Very crisp skin texture and striking, lifelike eyes.
+ Excellent implementation of braided hair as requested.
+ The 'battle-worn' aesthetic is strong with visible dirt and high-contrast armor detailing.

− The 'warm torchlight' lighting is much weaker and less atmospheric than Model A.
− Lacks the requested 'beads' in the hair.
− The metal of the armor looks slightly flat or overly bright in some areas despite being battle-worn.

Verdict: Both models captured the essence of the prompt well, but GPT Image 1 Mini took a superior approach to lighting and atmosphere, creating a much more convincing 'torchlight' effect. Stable Diffusion 3.5 Large produced a sharper image with better hair braids and lifelike eyes, but the lighting felt more like generic daylight, and both models failed to include the requested beads in the braids.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

GPT Image 1 Mini

Stable Diffusion 3.5 Large

67% wins 0% ties 33% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent anatomical accuracy for all four animals.
+ Rich, tactile fur texture and clear, expressive eyes.
+ Clearer rendering of the 'god rays' and sunrise lighting mentioned in the prompt.

− Composition feels a bit crowded towards the edges.
− Butterflies appear slightly flat compared to the animals.

Stable Diffusion 3.5 Large

+ Dynamic composition with a nice sense of movement and 'tumbling'.
+ Good use of bokeh and depth of field in the foreground/background.
+ Inclusion of plenty of butterflies to match the 'playfully chasing' prompt.

− The kitten has anatomically incorrect large, pointed fox-like ears.
− Lower overall sharpness and fine detail in the fur textures.
− Lighting feels a bit washed out in the center.

Verdict: GPT Image 1 Mini is the winner due to its superior anatomical accuracy and high-fidelity textures, whereas Stable Diffusion 3.5 Large struggled with the kitten's anatomy, giving it fox-like features. Both models followed the prompt well, but GPT Image 1 Mini's lighting and clarity felt more like the requested '8K masterpiece'.

GPT Image 1 Mini

OpenAI's cost-effective image generation model for when image quality isn't the top priority

View Model Arena

Stable Diffusion 3.5 Large

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency

View Model Arena