DALL-E 2 vs DALL-E 3
Head-to-head across 13 challenges
DALL-E 2
20.0%
win rate
Ties
20.0%
DALL-E 3
60.0%
win rate
Challenge Results
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
DALL-E 2
- + Decent glass reflection effects
- + Good color saturation
- − Failed almost all spatial relationship prompts
- − Blue sphere is missing, instead there is a giant blue pot in the background
- − The red element is inside the cube rather than on top of it
DALL-E 3
- + Excellent photographic quality and detail
- + Correctly included all requested elements like the red book and blue sphere
- + Sophisticated lighting and texture on the wooden surfaces
- − Swapped the spatial positions of the book and sphere
- − The cube has a wooden frame not requested in the prompt
Verdict: DALL-E 2 struggled significantly with the prompt instructions, failing to place the sphere inside the cube and misidentifying the plant background as a giant blue pot. DALL-E 3 successfully rendered all objects with high visual fidelity, though it did swap the vertical order of the book and the sphere. DALL-E 3 is the clear winner for its superior composition, clarity, and adherence to the complex list of objects.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
DALL-E 2
- + Successfully captures the requested imperfect framing and blurry candidate feel
- + Good use of shallow depth of field on the wet pavement reflections
- − The subject is heavily obscured and out of focus, failing to show the elderly man clearly
- − Poor resolution and overall muddy visual quality
DALL-E 3
- + Excellent adherence to all prompt elements including subject, red bicycle, and reflections
- + High visual clarity with realistic skin textures and atmospheric lighting
- + Creative use of foreground framing and background motion blur
- − Slightly more 'stylized' than requested despite the 'no stylization' instruction
- − Anatomical issues with the man's feet appearing merged or distorted
Verdict: DALL-E 2 produced a very abstract and low-quality image that failed to clearly depict the requested subject. DALL-E 3, however, followed the prompt meticulously, creating a high-detail, cinematic scene with excellent reflections and a clear narrative, despite some minor AI artifacts in the anatomy.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
DALL-E 2
- + Features warm lighting and bokeh as requested.
- − Failed to render a human face, appearing instead like a melting metal statue or macro photography of a miniature.
- − Lacks clarity and overall resolution.
- − Completely misses the specific details of hair, beads, and leather straps.
DALL-E 3
- + Excellent adherence to all prompt details including braided hair with beads, scars, and ornate engraving.
- + High visual clarity with realistic skin textures and lifelike eyes.
- + Effective use of cinematic lighting and bokeh sparks to create mood.
- − The sparks look a bit uniform and digital in their distribution.
- − The armor engraving is slightly overly busy/complex.
Verdict: DALL-E 2 produced an abstract, muddy image that failed to depict a person, looking more like melted metal or a macro shot of a figurine. DALL-E 3 followed every aspect of the prompt with high fidelity, delivering a detailed, cinematic portrait of a paladin with complex textures and clear features.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
DALL-E 2
- + Large, bold sans-serif typography
- + High-contrast visual style
- − Nonsensical, garbled text
- − Food photos are abstract, fragmented, and unappetizing
- − Poor layout that resembles a magazine spread more than a restaurant menu
DALL-E 3
- + Excellent adherence to the grid layout requirement
- + Clear sections for appetizers, pizza, and mains
- + Clean, professional aesthetic with vibrant food photography
- − Minor spelling errors in headers (e.g., 'PIZAS')
- − The typography for small details remains slightly illegible
Verdict: DALL-E 3 significantly outperforms DALL-E 2 by accurately interpreting the layout requirements and providing clear, appetising food photography. While DALL-E 2 produced an abstract and unreadable design, DALL-E 3 delivered a functional and aesthetically pleasing menu template that followed every prompt instruction.
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
DALL-E 2
- + Strong fiery atmosphere that matches the energy of the prompt
- + Dynamic use of glowing embers and light trails
- − Text is heavily garbled and misspelled (e.g., 'MARGIC', 'BAGUEC')
- − Low visual clarity with messy, unappetizing textures on the burger components
- − Failed to include the price or starburst element
DALL-E 3
- + Excellent photographic detail on ingredients like the patty and sesame bun
- + Clear composition that successfully suspends all requested components in mid-air
- + Modern, professional layout suitable for an actual advertisement
- − Spelling errors in primary text (e.g., 'BURGR', 'Limiited')
- − The background environment is more of a studio floor than the requested dark fiery background
Verdict: DALL-E 3 (Image B) is the clear winner as it provides a professional, high-resolution advertisement with clear ingredient separation, whereas DALL-E 2 (Image A) produces a low-quality, abstract image with unreadable text. Although DALL-E 3 contains minor typos like 'BURGR', its overall execution of the 'exploded burger' concept is far superior in both realism and composition.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
DALL-E 2
- + Authentic chalk texture and smearing effects.
- − Text is complete gibberish and does not follow the prompt's specific content.
- − Extremely low image resolution and lack of context.
- − Failed to render the required date or specific menu items.
DALL-E 3
- + Followed the specific header and date instructions accurately.
- + High resolution with a beautiful cozy café atmosphere and lighting.
- + Successfully rendered the requested menu items with recognizable spelling and chalk-style typography.
- − Several spelling errors like 'Trufle', 'Occtus', and 'Riototo'.
- − The menu layout is somewhat cluttered compared to the simple list requested.
- − Prices do not match the prompt (e.g., $234 instead of $24).
Verdict: DALL-E 2 produced an abstract image with no readable text, failing the prompt almost entirely. DALL-E 3 successfully adhered to the complex text requirements, including the specific date and most menu keywords, while providing a high-quality environmental context, despite some spelling and numerical errors.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
DALL-E 2
- + Attempts a surrealist painting style consistent with 'surreal' prompt.
- + Follows the positioning instruction by placing the horse elements in a non-standard way.
- − Extremely low visual quality and resolution.
- − Anatomically incoherent textures and shapes.
- − Fails to clearly represent a horse riding an astronaut.
DALL-E 3
- + Excellent high-resolution detail and cinematic lighting.
- + Clear representation of an astronaut and a horse in space.
- + High technical polish with intricate armor and bridle details.
- − Fails the specific prompt instruction to have the 'horse on top' (astronaut is riding the horse).
Verdict: DALL-E 2 produced an image with very poor clarity and artifacts, though it attempted a more abstract 'surreal' configuration. DALL-E 3 produced a technically stunning, high-detail cinematic image, but it failed the negative constraint to have the horse on top, reverting to a standard astronaut-riding-horse composition. Despite the instruction failure, DALL-E 3 is the preferred image due to the massive disparity in visual quality.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
DALL-E 2
- − Failed completely to follow the prompt.
- − Generated an image of a black handbag instead of a taxi scene.
- − Irrelevant to the user request.
DALL-E 3
- + Excellent adherence to all complex prompt elements including the capybara driver and businesswoman.
- + High visual quality with realistic lighting and fur textures.
- + Captures the requested mood and 'bored' expression of the passenger perfectly.
- − The capybara's jacket is yellow instead of the requested dark jacket.
- − The perspective makes the capybara look as if it is in the passenger seat rather than the driver seat.
Verdict: DALL-E 2 (Image A) failed to understand the prompt entirely, providing a random image of a handbag. DALL-E 3 (Image B) followed the prompt with high fidelity, creating a humorous and high-quality scene that captured the specific details of the capybara driver and the disinterested passenger.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
DALL-E 2
- + Captures a strong vintage, aged parchment aesthetic
- + The border style feels organic and gothic
- − Text is largely nonsensical and illegible
- − Fails to include several prompted elements like the jack-o-lantern and specific event details
- − Low visual clarity and muddy composition
DALL-E 3
- + Highly detailed composition with clear webs, thorns, and twisted trees
- + Includes the central glowing jack-o-lantern as requested
- + Text rendering is significantly more legible and adheres closer to requested details like the date and title
- − Text contains several spelling errors and garbled words in the finer print
- − The layout is a bit cluttered with too many small decorative elements
Verdict: DALL-E 3 is the clear winner as it successfully follows the complex prompt requirements, including the jack-o-lantern, webs, and specific event details. DALL-E 2 fails significantly on text rendering and misses the central subject (the pumpkin) entirely, providing only a vague atmospheric background.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
DALL-E 2
- + Matches the solid light blue background color perfectly.
- + Isometric 45-degree angle is accurately maintained.
- − Failed to render recognizable sushi, showing abstract blobs instead.
- − Text is misspelled ('Sush') and missing the 'JAPAN' requirement.
- − Poor visual quality with low-detail 3D objects and harsh lighting.
DALL-E 3
- + Excellent 3D miniature cartoon style with soft, rounded textures.
- + Includes all requested elements: diorama base, flag icon, and high-quality sushi models.
- + Perfectly centered and ultra-clean composition.
- − Placed the text on the side of the base rather than at the top-center of the image.
- − Omitted the specific word 'SUSHI' from the text elements.
Verdict: DALL-E 3 (Image B) significantly outperforms DALL-E 2 (Image A) in visual quality and style adherence, creating a charming 3D diorama that matches the 'cartoon' and 'PBR' prompts. While DALL-E 3 struggled with the exact placement of text, DALL-E 2 failed on almost every metric, including spelling, sushi recognition, and overall composition.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
DALL-E 2
- + Captures a sense of motion and action
- + Natural-looking golden lighting in the background
- − Failed to include most of the requested animals
- − Severe anatomical artifacts and blurry textures
- − The butterfly is distorted and lacks detail
DALL-E 3
- + Successfully includes all requested animal types in high detail
- + Outstanding fur texture and expressive facial features
- + Rich composition with vibrant wildflowers and clear butterfly rendering
- − Lean more toward an 'illustrative' aesthetic than hyper-photorealism
- − The lighting is somewhat overly stylized and fantasy-like
Verdict: DALL-E 3 is the clear winner as it successfully follows the complex multi-subject prompt, whereas DALL-E 2 fails to generate the kitten or bunny and produces significant visual artifacts. DALL-E 3 provides a beautiful, high-resolution composition with excellent textures, despite having a slightly more digital-art feel than pure photography.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
DALL-E 2
- + Simple minimalist layout
- + Adheres to the color palette
- − Text is nonsensical and does not follow the prompt
- − Cloche illustration is very basic and lacks the requested steam
- − Significant artifacts in the textures and lower typography
DALL-E 3
- + Excellent typography rendering for 'Est. 1720'
- + Highly detailed vector emblem style with good use of stippling and texture
- + Follows all prompt elements including the cloche, steam, and banner
- − Failed to include the specific name 'Caffè Florian', substituting it with 'Coffee House'
- − Slightly less 'minimalist' than requested, leaning more towards ornate
Verdict: DALL-E 2 produced a very low-quality image with garbled text and a lack of detail. DALL-E 3 produced a professional-grade vintage logo with excellent texture and layout, although it failed to use the specific brand name requested in the prompt. DALL-E 3 is the clear winner for its superior visual quality and adherence to the stylistic elements of the prompt.
Apollo 11: Journey to Tranquility
Text-to-Image“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”
AI Judge Analysis
DALL-E 2
- + Follows the requested color palette well.
- + Captures an abstract technical schematic feel.
- − Failed significantly on text rendering with severe misspellings like 'ALLPOO'.
- − The layout is cluttered and does not clearly show the requested 6-step infographic sequence.
- − Low visual clarity with many compressed, digital artifacts.
DALL-E 3
- + Excellent adherence to the infographic structure and layout requested.
- + Very clean vector aesthetic with high-quality icons and clear steps.
- + Reasonable text rendering for the main titles, despite some minor spelling variations.
- − Included a space shuttle silhouette which is historically inaccurate for the Apollo 11 mission.
- − Generated three layout variations in one image instead of a single poster.
- − Features some hallucinated icons that don't directly map to the 6 specific steps requested.
Verdict: DALL-E 3 is the clear winner as it successfully interprets the 'infographic poster' format and provides a clean, modern aesthetic with logical flow. DALL-E 2 fails on both legibility and structure, producing nonsensical text and a disorganized layout that does not fulfill the step-by-step requirements.
DALL-E 2
OpenAI's legacy image generation model supporting generations, edits with masks (inpainting), and variations
DALL-E 3
OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions