DALL-E 2 OpenAI Z-Image Turbo Alibaba

Settled by community votes across 11 shared challenges, with an AI judge weighing in on each.

DALL-E 2

17.7 arena score

#37 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Z-Image Turbo

24.7 arena score

#15 of 44 in Text-to-Image

Vote tally

Where the votes landed

DALL-E 2

0.0%

win rate

Ties

0.0%

Z-Image Turbo

100.0%

win rate

0.0% 0.0% ties 100.0%

Shared challenges 11

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

DALL-E 2

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 2

+ Decent lighting and reflection on the table surface.

− Failed almost all spatial instructions, including placing the sphere inside and the book on top.
− Confused the blue sphere with a blue plant pot.
− Generally low resolution and blurry compared to Model B.

Z-Image Turbo

+ Perfect adherence to all spatial instructions and object placements.
+ High visual clarity with realistic textures on the book and glass.
+ Accurate lighting direction and shadow placement.

− The glass cube appears to have a mirrored base which wasn't specifically requested, but it adds to the realism.

Verdict: Z-Image Turbo followed every detail of the complex spatial prompt perfectly, placing the blue sphere inside the cube and the red book on top as requested. DALL-E 2 failed significantly, conflating prompt elements and failing to render the requested objects in their correct positions. Z-Image Turbo is the clear winner for its superior prompt adherence and photographic quality.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Strong bokeh and shallow depth of field as requested
+ Captures an 'imperfect framing' and 'candid' feel with its composition

− The subject is completely out of focus and unrecognizable
− Low technical quality with muddy textures and poor resolution

Z-Image Turbo

+ Excellent depiction of the elderly man with natural skin texture
+ Accurately represents the red bicycle and wet pavement reflections
+ High clarity and realistic lighting

− Failed to include 'motion blur from passing cars'
− The depth of field is deeper than the requested 'shallow' 50mm look

Verdict: Z-Image Turbo is the clear winner as it depicts the actual subject and environment described with high fidelity, despite missing the motion blur requirement. DALL-E 2 produced an unusable image where the focal point is misplaced, leaving the main subject completely blurred and making the composition incoherent.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Captures a gritty, textured feel for 'battle-worn'

− extremely poor resolution and lack of clarity
− failed to include braided hair or distinct beads
− distorted anatomy and unrecognizable features

Z-Image Turbo

+ excellent adherence to all prompt details including braided hair with beads and engraved armor
+ superior photorealistic quality and lighting
+ effective use of bokeh and torchlight reflections

− torch flame appears slightly detached from the torch head

Verdict: DALL-E 2 produced a blurry, abstract image that failed to capture most of the specific prompt details like the hair braids or lifelike eyes. Z-Image Turbo delivered a high-quality, professional portrait that perfectly executed every requirement, from the intricate engraving on the armor to the subtle scars and beads in the hair.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

DALL-E 2

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 2

+ Strong bold sans-serif typography
+ High contrast minimalist aesthetic

− Nonsensical, fragmented food photos that do not resemble a grid
− Illegible small text and abstract shapes that look more like an art book than a menu
− Fails to include specific requested sections like pizza or mains

Z-Image Turbo

+ Excellent adherence to the grid layout for food photos
+ Clearly defined sections for Appetizers, Pizza, and 'Mans' (Mains)
+ Professional menu layout with pricing and clean vibrant accents

− Slight spelling errors in text ('MANS', 'SETIIION')
− Repetitive pasta images in the top grid

Verdict: Z-Image Turbo is the clear winner as it directly follows the prompt's structural requirements, creating a functional menu layout with a clear grid and specific sections. DALL-E 2 produced an abstract design that, while visually striking, fails the basic requirements of a menu design and features highly distorted, fragmented food imagery.

Magic Burger Explosion: Fiery Photorealism Challenge

Text-to-Image

“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Successfully captures a sense of fiery motion and explosion.
+ Includes a glowing, ember-filled background.

− Text is largely illegible and misspelled ('MARGIC BAGUEC').
− Image quality is low-resolution and lacks photorealistic detail.
− The burger components are messy and not clearly defined.

Z-Image Turbo

+ Excellent text rendering, correctly spelling all requested phrases and symbols.
+ Exceptional photorealistic detail in the food textures.
+ Perfectly adheres to the layout requests, including the starburst for the price.

− The burger is floating/hovering rather than occurring as a dynamic 'exploded' view of individual components.
− The background fire feels slightly static compared to the burger.

Verdict: Z-Image Turbo is the clear winner as it produced a professional-grade advertisement with perfect text legibility and high-quality textures. While it missed the 'exploded' component request, DALL-E 2 failed significantly on text rendering, image clarity, and overall aesthetic appeal.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ The chalk texture on the individual strokes looks authentic.

− The text is completely illegible gibberish.
− The prompt's specific menu items and date were entirely ignored.
− The composition is crowded and disorganized.

Z-Image Turbo

+ Excellent prompt adherence with nearly perfect spelling of specific menu items.
+ The layout is clean, balanced, and easy to read.
+ Realistic chalk smudge effects on the background enhance the chalkboard feel.

− Minor spelling error in 'Mustroom' for 'Mushroom'.
− The handwriting is almost too perfect, leaning slightly towards a digital font feel in places.

Verdict: Z-Image Turbo is the clear winner as it successfully rendered almost all the specific text requested in the prompt with high legibility. DALL-E 2 failed to produce any meaningful text, resulting in a board of illegible scribbles that did not follow the instructions.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Texture representation is decent for a leather object

− Failed completely to follow the prompt
− Image shows a black leather bag instead of a taxi scene

Z-Image Turbo

+ Excellent adherence to all prompt details including the capybara driver and bored passenger
+ High visual quality with realistic lighting and fur textures
+ Effective use of depth of field for the Manhattan background

− Small anatomical glitch with the capybara's hand/paw on the wheel

Verdict: DALL-E 2 suffered a complete failure, providing an irrelevant image of a black handbag. Z-Image Turbo followed the complex prompt accurately, delivering a high-quality, photorealistic image of the capybara taxi driver and the human passenger in a convincing urban setting.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Captures a vintage, hand-painted aesthetic
+ Follows the twisted tree and dark parchment concept in a stylistic way

− Text is completely illegible and nonsensical
− Low resolution with messy, smeared textures
− Fails to include specific event details requested in the prompt

Z-Image Turbo

+ Excellent text rendering with almost perfect spelling of all requested details
+ High-quality, cinematic composition with clear details for the jack-o-lantern and webs
+ Strong adherence to all prompt elements including the scroll banner and specific date/location

− Minor typo in the location ('Archves' instead of 'Arches')
− The border thorns and webs look a bit like floating assets rather than integrated parchment texture

Verdict: Z-Image Turbo is the clear winner as it successfully rendered almost all specific text requirements, including the date, time, and title, whereas DALL-E 2 produced illegible scribbles. Z-Image Turbo also provided a much more polished, high-resolution aesthetic that perfectly captures the 'cinematic lighting' and composition requested.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Features a 3D isometric perspective platform
+ Distinct shadows create a sense of depth

− Failed significantly on text rendering, displaying 'Sush' instead of the requested phrases
− Poor image quality with low resolution and artifacting
− Missed nearly all prompt elements including the flag and 'JAPAN' text

Z-Image Turbo

+ Excellent text rendering of 'JAPAN' and 'SUSHI'
+ High-quality 3D miniature aesthetic with clean PBR textures
+ Perfect adherence to the 45-degree isometric composition and diorama base request

− Included the flag of China instead of the flag of Japan

Verdict: Z-Image Turbo is the clear winner as it followed almost every stylistic and compositional prompt requirement, producing a high-clarity 3D miniature. While it hallucinated the wrong flag, DALL-E 2 failed fundamentally on text, image quality, and basic prompt adherence, resulting in a low-resolution and incomplete image.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Successfully captures a dynamic sense of motion with the running puppy.
+ Good lighting and color saturation in the meadow.

− Significant anatomical local artifacts, particularly in the faces of the smaller animals.
− Low image resolution and a painterly/blurry texture rather than hyper-photorealistic.
− The butterfly and smaller animals are poorly rendered and distorted.

Z-Image Turbo

+ Excellent anatomical accuracy for all four requested animals.
+ Crystal clear 8K quality with ultra-detailed fur textures as requested.
+ Beautiful composition that clearly shows the 'tumbling together' interaction.

− The lighting is a bit uniform for a 'deep forest' sunrise, leaning toward a studio-lit look.

Verdict: Z-Image Turbo is the clear winner here, successfully rendering all four distinct animal types with high fidelity and adorable expressions. In contrast, DALL-E 2 struggled with the complex multi-subject prompt, resulting in distorted features and a lack of the requested hyper-photorealism.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

DALL-E 2

Z-Image Turbo

AI Judge Analysis

DALL-E 2

+ Follows the warm brown and cream color palette.
+ Includes a cloche dome element.

− Text is completely illegible and gibberish.
− The steam and banner elements are poorly rendered and chaotic.
− Lacks the minimalist, clean look of a vector logo.

Z-Image Turbo

+ Perfect text rendering for both 'Caffè Florian' and 'Est. 1720'.
+ Clean, professional vector minimalist style with a subtle background texture.
+ Accurate interpretation of all prompt elements including the cloche and steam.

− The 'banner' for the date is represented by line separators rather than a decorative physical banner.

Verdict: Z-Image Turbo is the clear winner as it successfully renders the requested text and maintains a clean, professional vector aesthetic. DALL-E 2 fails significantly on text legibility and graphic coherence, producing an unusable and messy design.

Next steps

Explore each model

DALL-E 2

OpenAI

OpenAI's legacy image generation model supporting generations, edits with masks (inpainting), and variations

Vote this model in the arena

Arena profile Lumenfall catalog

Z-Image Turbo

Alibaba

Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering

Vote this model in the arena

Arena profile Lumenfall catalog