DALL-E 2 OpenAI Imagen 4.0 Fast Generate 001 Google

Settled by community votes across 3 shared challenges, with an AI judge weighing in on each.

DALL-E 2

17.7 arena score

#37 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Imagen 4.0 Fast Generate 001

17.1 arena score

#39 of 44 in Text-to-Image

Vote tally

Where the votes landed

DALL-E 2

0.0%

win rate

Ties

0.0%

Imagen 4.0 Fast Generate 001

100.0%

win rate

0.0% 0.0% ties 100.0%

Shared challenges 3

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

DALL-E 2

Imagen 4.0 Fast Generate 001

AI Judge Analysis

DALL-E 2

+ Successfully captures a red bicycle
+ Includes wet pavement reflections
+ Depicts shallow depth of field

− Subject is out of focus and identity is unclear
− Low overall image resolution and clarity

Imagen 4.0 Fast Generate 001

+ Clear representation of an elderly Japanese man
+ High visual quality and realistic skin textures
+ Excellent wet road reflections and car background

− The frame-within-a-frame is more planned than 'imperfect framing'
− Motion blur on cars is subtle

Verdict: Imagen 4.0 Fast Generate 001 provides a high-quality, coherent image that follows almost all prompt instructions including the subject's age and ethnicity. DALL-E 2 fails to keep the subject in focus, making it difficult to verify the identity of the person or the quality of the repair action.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

DALL-E 2

Imagen 4.0 Fast Generate 001

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 2

+ Successfully interprets the fantasy setting and armor concept.
+ Captures the warm lighting and bokeh elements requested in the prompt.

− Resolution is low and details appear muddy or distorted.
− Lacks the requested lifelike eyes and braided hair detail.

Imagen 4.0 Fast Generate 001

+ High resolution with clear, realistic textures on the leather jacket.
+ Excellent composition and framing of the human subject within the environment.

− Completely ignores the prompt instructions regarding armor, paladins, and warm torchlight.
− Provides a modern-day setting instead of the requested fantasy scene.

Verdict: DALL-E 2 attempted to follow the prompt's thematic instructions but failed on technical execution and clarity. Imagen 4.0 Fast Generate 001 produced a high-quality, realistic image that is entirely irrelevant to the user's specific request. DALL-E 2 is preferred only because it stayed on-topic, despite the poor visual quality.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

DALL-E 2

Imagen 4.0 Fast Generate 001

AI Judge Analysis

DALL-E 2

+ Captures a sense of motion and playfulness
+ Includes the butterfly requested in the prompt

− Low visual fidelity with heavy artifacting and blurry textures
− Anatomy of the smaller animals is distorted and incoherent
− Fails to clearly represent all four distinct animals requested

Imagen 4.0 Fast Generate 001

+ Excellent photographic clarity and high-resolution fur textures
+ Accurately represents all four requested animal species
+ Striking lighting with effective use of backlighting and golden hour tones

− Failed to include the requested butterflies
− The scene is a static pose rather than the requested 'playfully chasing' action

Verdict: Imagen 4.0 Fast Generate 001 produces a much higher quality image with clear, recognizable animals and beautiful lighting, though it fails to capture the 'chasing' action and butterflies. DALL-E 2 attempts the requested action and elements but suffers from severe technical quality issues, resulting in a messy and anatomically incorrect composition. Imagen is the clear winner for its superior realism and detail.