OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
GPT Image 1.5
#7 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Large Turbo
#44 of 44 in Text-to-Image
Where the votes landed
GPT Image 1.5
100.0%
win rate
Ties
0.0%
Stable Diffusion 3.5 Large Turbo
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the 'exploded' layout request with clear separation of ingredients.
- + All required text elements are rendered perfectly with the specified glowing effect.
- + High level of photorealistic detail in the food textures and embers.
- − The composition feels a bit crowded with the large starburst overlapping the burger elements.
Stable Diffusion 3.5 Large Turbo
- + Dynamic use of fire and smoke creates a strong atmospheric effect.
- + Good vertical symmetry and lighting.
- − Failed to include any of the requested text elements.
- − The burger is not 'exploded' but mostly intact, missing the core layout instruction.
- − Visual style is more digital/illustrative than photorealistic.
Verdict: GPT Image 1.5 followed the prompt instructions near-perfectly, successfully including the specific text phrases, the exploded layout, and the price starburst. Stable Diffusion 3.5 Large Turbo failed to include any text or the exploded layout, resulting in a generic burger image that missed several key requirements.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 1.5
- + Excellent text accuracy with near-perfect rendering of the complex menu items.
- + Authentic chalk texture and handwriting style that looks truly hand-drawn.
- + Successfully followed the date and price requirements.
- − The composition is very tight, focusing only on the board rather than the 'cozy café' environment.
Stable Diffusion 3.5 Large Turbo
- + Successfully rendered a 'cozy café' environment with plants and lighting.
- + Clean composition with a realistic frame around the board.
- − Severe spelling errors and gibberish text in the menu items.
- − The text style looks like a digital font rather than natural chalk handwriting.
- − Failed to include the specific year and many requested menu details.
Verdict: GPT Image 1.5 is the clear winner as it perfectly captured the specific and complex text requested, maintaining a highly realistic chalk texture throughout. Stable Diffusion 3.5 Large Turbo struggled significantly with the text, producing illegible words and failing to adhere to the handwritten style requirement.
Explore each model
Distilled version of SD 3.5 Large that generates high-quality images in just 4 steps, offering faster inference and reduced costs