DALL-E 3 OpenAI Grok Imagine Image xAI

Settled by community votes across 11 shared challenges, with an AI judge weighing in on each.

DALL-E 3

18.5 arena score

#35 of 45 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Grok Imagine Image

24.1 arena score

#19 of 45 in Text-to-Image

Vote tally

Where the votes landed

DALL-E 3

50.0%

win rate

Ties

0.0%

Grok Imagine Image

50.0%

win rate

50.0% 0.0% ties 50.0%

Shared challenges 11

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

DALL-E 3

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

DALL-E 3

+ High visual quality and artistic lighting effects
+ Exquisite rendering of materials like wood and paper textures

− Failed prompt spatial logic; placed the book inside the cube and the sphere on the book
− Used a wooden frame instead of a simple glass cube

Grok Imagine Image

+ Perfect adherence to all spatial instructions
+ Realistic lighting and shadows that match the window light description
+ Clean and accurate representation of the glass cube and blue sphere

− The sphere appears to be floating mid-air inside the cube without support
− The plant behind the glass is slightly blurred/out of focus

Verdict: Grok Imagine Image followed the prompt's spatial instructions perfectly, placing the sphere inside the cube and the book on top. DALL-E 3 produced a more artistic and detailed image, but completely failed the logical placement of the objects by putting the book inside and the sphere on top of it.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent use of reflections and cinematic lighting
+ Creative foreground framing that enhances the 'candid' feel
+ High level of detail in the character and bicycle texture

− Anatomical errors in the man's feet and crouched posture
− The car in the background looks slightly static despite the prompt asking for motion blur

Grok Imagine Image

+ Successfully captures realistic motion blur on the passing car
+ Authentic 'candid' street photography aesthetic with realistic clothing
+ Naturalistic lighting and color grading that avoids over-stylization

− The man's face and hands are somewhat obscured and muddy
− The composition overall is a bit flat compared to the depth in the other image

Verdict: Grok Imagine Image followed the technical camera prompts more accurately, particularly regarding the motion blur and the gritty, unstylized realism of a candid street photo. DALL-E 3 produced a more visually striking and polished 'cinematic' image, but it featured significant anatomical distortions and a more AI-typical painterly finish.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

DALL-E 3

Grok Imagine Image

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 3

+ Excellent detail on the engraving and patina of the armor
+ Intense and lifelike eye detail
+ Strong adherence to the 'lifelike eyes' and 'bokeh sparks' requirements

− The facial features look a bit overly airbrushed/smooth despite the scars
− The hair beads/accessories look a bit cluttered and fantastical

Grok Imagine Image

+ Natural skin texture and more realistic lighting transitions
+ Specific detail on the leather straps and buckles as requested
+ More coherent braiding with visible beads as per the prompt

− Armor looks slightly less 'battle-worn' and more pristine compared to the skin
− The torchlight in the background is a bit distracting and less 'bokeh' than requested

Verdict: DALL-E 3 produces a more stylized and intense cinematic portrait with heavy focus on armor engravings, while Grok Imagine delivers a more grounded and realistic human texture with superior detail on the secondary elements like leather straps. Grok Imagine is the likely winner for its more naturalistic skin rendering and better interpretation of the braided hair and beads.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent grid-based composition for food photography
+ High-quality, realistic food images
+ Professional use of color blocking for sections

− Text is mostly illegible scribble
− Layout is more of a lookbook than a functional menu
− Repeats 'Pizza' several times incorrectly

Grok Imagine Image

+ Near-perfect text rendering for headings and menu items
+ Strictly follows the requested 'Appetizers, Pizza, Mains' categorization
+ Clean, functional typography with bold sans-serif fonts

− Food images are slightly less realistic than Model A
− Duplicate menu items (e.g., Grilled Salmon, Steak Frites) used to fill space
− Food spacing is a bit cramped at the bottom

Verdict: Grok Imagine Image is the superior choice for this task because it generates legible, accurate text that follows the specific categorization requested in the prompt. While DALL-E 3 produces more aesthetically pleasing photography, it fails to deliver a functional menu design, whereas Grok provides a usable template with professional typography.

Magic Burger Explosion: Fiery Photorealism Challenge

Text-to-Image

“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent photorealistic lighting and texture on the food components
+ Great vertical composition creating a sense of height and explosion
+ High-quality rendering of the fiery ground and embers

− Multiple spelling errors in text including 'MAGIC BURGR' and 'Limiited'
− Failed to place the price inside a starburst shape

Grok Imagine Image

+ Perfect text rendering for all requested strings with zero spelling errors
+ Accurately included the price inside a starburst as requested
+ Strong fiery aesthetic applied to the title and background

− The 'exploded' effect is less cohesive, with components appearing scattered rather than a vertical stack
− The lighting on the food is a bit flat compared to the dramatic glow in Model A

Verdict: Grok Imagine Image is the winner because it successfully followed all text instructions, including the specific 'starburst' requirement and correct spelling. While DALL-E 3 produced a more visually stunning food render with better lighting, its frequent typos ('BURGR', 'Limiited') make it unusable as a final advertisement.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent chalk texture and artistic flourishes
+ Cozy café atmosphere with warm lighting

− Numerous spelling errors like 'Trufle', 'Occtus', and 'Riototo'
− Text is cluttered and messy, failing to follow the specific order requested

Grok Imagine Image

+ Exceptional text rendering with 100% accuracy to the prompt
+ Captures the realistic chalk variations and texture perfectly
+ Clean, legible composition that looks like an actual handwritten board

− Simple composition compared to the more artistic background of the other model

Verdict: Grok Imagine followed the prompt precisely, rendering all requested text and prices with perfect spelling and a convincing handwritten chalk style. DALL-E 3 struggled significantly with spelling and coherence, hallucinating extra text and incorrect prices while failing to complete the third menu item correctly. Grok Imagine is the clear winner for its superior prompt adherence and text legibility.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent fur texture and lighting on the capybara.
+ Includes a realistic dashboard with working-style illumination.
+ Creatively includes a 'CAPYBARA' sign in the background bokeh.

− Failed to include the human passenger in the back seat.
− The capybara's hands look more like human fingers in fur gloves.

Grok Imagine Image

+ Perfectly captures both subjects as requested, including the bored businesswoman.
+ Accurate yellow cap as specified in the prompt.
+ Highly realistic cinematic composition from a front-windshield perspective.

− The capybara's claws are slightly too sharp/long compared to reality.
− The lighting on the woman's face is a bit flat compared to the driver side.

Verdict: Grok Imagine is the clear winner because it followed the entire prompt, correctly placing the human businesswoman in the back seat with the specified expression. DALL-E 3 produced a high-quality animal portrait but failed to include the second character and the bored interaction that was central to the scene's concept.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

DALL-E 3

Grok Imagine Image

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 3

+ Excellent 3D miniature diorama feel with high-quality PBR-like lighting and subsurface scattering.
+ Creative integration of the text 'JAPAN' into the base structure.
+ Cohesive isometric perspective and clean aesthetic.

− Failed to include the word 'SUSHI' in text.
− Text is not placed at the 'top-center' as requested.
− The sushi design is slightly abstracted and less recognizable than traditional sushi.

Grok Imagine Image

+ Perfect adherence to text instructions, including 'JAPAN', 'SUSHI', and the flag icon at top-center.
+ Accurate variety of sushi (nigiri and maki) that looks appealing and clear.
+ Strict adherence to the 45-degree isometric prompt and diorama base.

− The textures are a bit flat and less 'refined' or 'realistic PBR' compared to the other model.
− The lighting lacks the depth and warmth found in the competitor's image.

Verdict: Grok Imagine is the winner because it followed all text prompts perfectly, specifically placing the required words and flag icon exactly where requested. While DALL-E 3 produced a more visually stunning 3D render with superior lighting, it failed to include all the requested text and ignored the specified layout for that text.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Excellent depiction of god rays and soft, golden lighting
+ High level of detail in the fur texture and meadow flora
+ Includes all requested animals and butterflies

− Anatomical errors such as a bird-like body on one of the butterflies
− The scale of the animals is inconsistent, with the kitten appearing very small compared to the others
− Feels more digital illustration than hyper-photorealistic

Grok Imagine Image

+ Better sense of action and 'tumbling' as requested in the prompt
+ Faces are more proportional and have consistent expressive eyes
+ Colors are vibrant and the lighting feels warm and joyful

− Completely missed the 'butterflies' part of the prompt
− The fur texture has a slightly plastic, smoothed-out look in some areas
− Anatomy of the fox/puppy merger in the middle is a bit muddled

Verdict: DALL-E 3 followed the prompt more literally by including the butterflies, though it introduced strange artifacts like bird-butterfly hybrids. Grok Imagine captured the 'tumbling' action significantly better, creating a more dynamic and emotionally resonant scene, despite forgetting the butterflies and having a slightly more 'cartoonish' photorealism. DALL-E 3 is the winner for better prompt adherence and finer detail in the environment.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

DALL-E 3

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

DALL-E 3

+ Excellent use of subtle texture and vintage halftone effects.
+ Complex and balanced vector emblem composition.
+ Accurate inclusion of all requested elements including the cloche and date.

− Failed to include the specific text 'Caffè Florian', replacing it with generic text.
− The steam curls are a bit disconnected from the cloche.

Grok Imagine Image

+ Successfully rendered the specific brand name 'Caffè Florian' with correct accents.
+ Clean, minimalist aesthetic that fits a modern-retro logo style.
+ Good use of negative space for the steam and highlights.

− Redundant inclusion of the 'Est. 1720' text appearing twice.
− The cloche is awkwardly combined with a coffee cup/spoon silhouette, which was not requested.
− Lacks the 'vintage texture' requested, appearing very flat.

Verdict: While DALL-E 3 produced a far more aesthetically sophisticated and textured vintage emblem, it failed the primary text requirement by ignoring the name 'Caffè Florian'. Grok Imagine succeeded in including the correct brand name with clean typography, but the design is more generic and contains redundant text elements. DALL-E 3 is visually superior as a logo design, but Grok Imagine is more useful for actual branding due to text accuracy.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

DALL-E 3

Grok Imagine Image

AI Judge Analysis

DALL-E 3

+ Features a highly sophisticated, professional aesthetic reminiscent of Mid-century modern design.
+ Excellent use of the specified NASA-inspired color palette.
+ Superior artistic composition with intricate textures and detailed vector work.

− Completely fails to follow the logical sequence of the requested steps.
− Contains significant spelling errors and nonsensical orbital mechanics.
− Includes space shuttles, which are historically inaccurate for the Apollo 11 mission.

Grok Imagine Image

+ Follows the requested 6-step logical sequence exactly as described in the prompt.
+ Renders legible text for the main headings and preserves the NASA logo correctly.
+ Accurately represents the Saturn V and Lunar Module silhouettes.

− Lacks the 'modern vector' feel, appearing more like a basic clipart arrangement.
− Minor spelling errors in the smaller supporting text (e.g., '3rajoory').
− The layout is somewhat crowded and lacks the artistic depth of the other model.

Verdict: Model A (DALL-E 3) creates a much more visually stunning and artistic poster, but it fails on every functional level of the prompt, including historical accuracy and the requested step-by-step logic. Model B (Grok Imagine) strictly adheres to the requested information architecture and steps, making it much more useful as an actual infographic despite its more simplistic 'clipart' style. Model B is the winner for following all instructions and correctly interpreting the historical context.