DALL-E 3 OpenAI Z-Image Turbo Alibaba

Settled by community votes across 11 shared challenges, with an AI judge weighing in on each.

DALL-E 3

18.5 arena score

#35 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Z-Image Turbo

24.7 arena score

#15 of 44 in Text-to-Image

Vote tally

Where the votes landed

DALL-E 3

0.0%

win rate

Ties

0.0%

Z-Image Turbo

100.0%

win rate

0.0% 0.0% ties 100.0%

Shared challenges 11

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ High visual quality with intricate wood grain and glass reflections
+ Creative interpretation of the sphere containing a miniature landscape

− Failed multiple spatial instructions: the book is inside the cube instead of on top
− The sphere is on top of the book rather than just inside the cube

Z-Image Turbo

+ Perfect adherence to spatial prompts
+ Accurate placement of the red book on top of the glass cube
+ Correct rendering of the small blue sphere inside and a plant behind

− Lower resolution and more generic aesthetic compared to the alternative
− The glass cube has a mirrored bottom which wasn't requested

Verdict: While DALL-E 3 produced a more visually stunning image, it failed significantly on the spatial logic of the prompt by placing the book inside the cube. Z-Image Turbo followed every specific positioning instruction perfectly, making it the clear winner for adherence and functional accuracy.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

DALL-E 3

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

DALL-E 3

+ Excellent atmospheric lighting and puddles with clear reflections
+ Successfully captured 'imperfect framing' with a foreground blur element
+ Strong cinematic quality with high-end photographic composition

− Anatomical issues in the man's feet and lower legs
− The bicycle has structural inconsistencies common in AI-generated wheels

Z-Image Turbo

+ Natural skin texture on the man's face and arms
+ Realistic rainfall effect and wet pavement texture
+ The bicycle structure is more coherent than image A

− Failed to include 'motion blur from passing cars' as cars are static and sharp
− Composition is very centered and lacks the requested 'cinematic' feel
− The man is holding the bike rather than actively 'repairing' it

Verdict: DALL-E 3 followed the complex stylistic prompts much better, capturing the cinematic lighting, motion blur, and creative framing requested. Z-Image Turbo produced a more realistic human skin texture but failed on several key prompt instructions, resulting in a generic and static snapshot rather than a cinematic candid photo.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent high-contrast cinematic lighting with strong orange bokeh.
+ Exceptional detail on the engraved metal and individual beard hairs.
+ Lifelike eyes with intricate iris patterns.

− The helmet design is somewhat over-engineered and covers more of the head than requested.
− The skin looks a bit overly polished/airbrushed despite the scars.

Z-Image Turbo

+ Naturalistic and realistic depiction of the dirt and bruising on the face.
+ Braided hair with beads is more clearly defined as requested.
+ The inclusion of the physical torch provides a logical source for the warm lighting.

− The overall image is slightly softer and lacks the extreme texture detail of the rival model.
− The bokeh effect is less pronounced and less artistic.

Verdict: Both models adhered well to the prompt, but DALL-E 3 produced a more high-fidelity, cinematic image with striking detail on the plate armor and eyes. Z-Image Turbo delivered a more grounded and realistic interpretation of 'battle-worn', with superior hair braiding and natural skin textures, though it lacked the sharp, intricate engraving details found in DALL-E 3.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Provides multiple layout variations in a single image.
+ High-quality, realistic food photography that feels professional.
+ Complex grid structure that fits the minimalist modern aesthetic.

− Text consists of illegible gibberish symbols.
− The presentation of four separate pages makes it harder to see one design clearly.

Z-Image Turbo

+ Exceptional text rendering for words like 'APPETIZERS' and 'PIZZA'.
+ Clean, highly organized grid layout that strongly adheres to the prompt.
+ Consistent lighting across all food photos.

− Spelling error 'MANS' instead of 'MAINS'.
− The food photography looks slightly more artificial/digital compared to Image A.

Verdict: Both models followed the prompt well, but Z-Image Turbo is the clear winner for its functional design and readable sans-serif typography. While DALL-E 3 produced more realistic food photos, its text was completely illegible, whereas Z-Image Turbo created a believable menu layout that could almost be used for a real business.

Magic Burger Explosion: Fiery Photorealism Challenge

Text-to-Image

“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent depiction of the 'exploded' burger concept with distinct floating layers.
+ Captures a high sense of motion and energy with fire and glowing embers.
+ Strong photorealistic textures on the grilled meat and fresh vegetables.

− Significant spelling errors in the text, including 'MAGIC BURGR' and 'Limiited'.
− The price tag is rendered in a generic box rather than a starburst as requested.

Z-Image Turbo

+ Perfect text rendering with no spelling errors and a great glowing effect.
+ Accurate interpretation of the 'starburst' for the price tag.
+ High visual quality with realistic food textures and appealing composition.

− Failed to create an 'exploded' burger, showing a mostly assembled burger instead.
− Lacks the sense of mid-air suspension for individual components like lettuce and tomato.

Verdict: While DALL-E 3 followed the 'exploded' layout much better, its failure to spell the product name correctly and include the requested starburst makes it less professional. Z-Image Turbo produced perfect, usable text and satisfied the starburst requirement, though it missed the specific 'exploded' structural detail of the prompt. Z-Image Turbo is the likely winner for its polish and accuracy in text and specific graphic elements.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent artistic lighting and atmosphere
+ Realistic chalk texture and decorative flourishes

− Significant spelling errors throughout the menu
− Layout is cluttered and difficult to read
− Failed to render the requested prices accurately

Z-Image Turbo

+ Highly accurate text rendering with almost no spelling errors
+ Clean and legible layout follows the prompt specifically
+ Very convincing handheld chalk handwriting style

− Slight spelling error on 'Mustroom'
− Less artistic 'cozy café' atmosphere compared to the other model

Verdict: Z-Image Turbo is the clear winner as it successfully rendered nearly the entire complex text prompt with high legibility and correct prices, whereas DALL-E 3 struggled significantly with spelling and layout logic. While DALL-E 3 created a more visually atmospheric scene, Z-Image Turbo's ability to follow the specific text instructions and maintain a consistent handwriting style makes it more useful for this task.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent interior detail with realistic dashboard and lighting.
+ Great atmosphere that captures the feel of Manhattan at night.
+ Creative background detail with 'CAPYBARA' signage on a building.

− Completely failed to include the human businesswoman in the back seat.
− The cap is black rather than the requested yellow.

Z-Image Turbo

+ Includes all prompted elements including the human passenger looking at a phone.
+ Accurately colored yellow cap for the driver.
+ Good interpretation of the 'bored' expression on the passenger.

− Anatomical issues with the capybara's hands which appear primate-like.
− The lighting is flat and looks like daylight/overcast rather than a vibrant NYC night.

Verdict: Z-Image Turbo is the clear winner for prompt adherence as it successfully included the businesswoman in the back seat, whereas DALL-E 3 completely ignored that part of the prompt. While DALL-E 3 produced a much more visually stunning and atmospheric interior, it failed the core requirements of the scene composition.

The Halloween Invitation

Text-to-Image

“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Exquisite lighting and 3D-depth on the gothic frame elements
+ Strong atmosphere with complex silhouettes of trees and webs
+ High-quality 'cinematic' appearance that feels premium

− Text is largely unintelligible and includes many typos
− Failed to render the specific requested address (NYC) correctly

Z-Image Turbo

+ Excellent text legibility and accuracy for almost all requested fields
+ Clear inclusion of the specific scroll banner and holiday details
+ Vibrant colors on the jack-o-lantern are eye-catching

− The 'Arches' is misspelled as 'Archves'
− The layout feels more like a digital collage than a cohesive vintage poster
− Vignette trees and web elements are somewhat generic compared to the other model

Verdict: While DALL-E 3 creates a much more atmospheric and artistically complex gothic poster, it fails significantly at providing legible text. Z-Image Turbo captures almost all specific text requirements and layout prompts correctly, making it functional as an invitation. For a request requiring specific event details, Z-Image Turbo is the superior choice despite the slightly lower artistic depth.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent 3D miniature diorama feel with complex textures
+ Accurate representation of the Japanese flag
+ Clean isometric perspective with a professional finish

− Failed to place the text 'JAPAN' and 'SUSHI' at the top-center as requested
− The rice texture looks like large beads rather than grains

Z-Image Turbo

+ Perfect adherence to text placement and content requirements
+ Clean, minimal aesthetic that matches the 'cartoon' request well
+ Accurate 45-degree top-down isometric view

− Displayed the flag of China instead of the requested Japanese flag icon
− The diorama base is very simple compared to the '3D miniature' prompt

Verdict: While Model A (DALL-E 3) produced a more visually intricate and high-quality 3D diorama, Model B (Z-Image Turbo) followed the specific layout instructions much more closely, including the 'JAPAN' and 'SUSHI' text. However, Model B's inclusion of a Chinese flag for a Japanese sushi prompt is a significant cultural accuracy error, making DALL-E 3 the more reliable output despite the text placement issues.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent adherence to lighting requests with distinct god rays and dew sparkles
+ Strong artistic consistency and vibrant color palette

− Fails the hyper-photorealistic requirement by looking more like a digital illustration or 3D animation
− Strange hybrid 'butterfly-animals' appear in the sky instead of normal butterflies

Z-Image Turbo

+ Successfully achieves a much more photorealistic look as requested
+ Better anatomy and natural integration of the four animals requested
+ Accurate depiction of butterflies and dew drops

− Lighting is a bit flatter compared to the dramatic god rays requested
− One of the puppy's paws is oddly positioned/blended into the bunny

Verdict: While DALL-E 3 captures the magical atmosphere and god rays more vividly, it completely fails the 'photorealistic' requirement, delivering a 3D-style illustration with bizarre animal-butterfly hybrids. Z-Image Turbo captures the essence of the prompt with realistic fur textures and a believable meadow setting, making it the superior choice for this specific request.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

DALL-E 3

Z-Image Turbo

AI Judge Analysis

DALL-E 3

+ Excellent use of texture and vintage stippling effects.
+ Sophisticated layout with intricate border details.
+ High-quality vector emblem aesthetic.

− Failed to follow the core text instruction, displaying 'COFFEE HOUSE' instead of 'Caffè Florian'.
− Layout is more 'complex vintage' than 'minimalist'.

Z-Image Turbo

+ Perfect adherence to text prompt including the specific name 'Caffè Florian'.
+ Closer adherence to the 'minimalist' style requested.
+ Clean, professional typography and layout.

− The 'Est. 1720' is not on a 'banner' as requested.
− The steam effect is very simple compared to the artistic curls in the other model.

Verdict: While DALL-E 3 produced a more visually rich and textured badge, it failed the fundamental task of including the specific restaurant name, instead using generic text. Z-Image Turbo followed all text instructions perfectly and captured the requested minimalist aesthetic, making it the more successful logo for the specific brief.

Next steps

Explore each model

DALL-E 3

OpenAI

OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions

Vote this model in the arena

Arena profile Lumenfall catalog

Z-Image Turbo

Alibaba

Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering

Vote this model in the arena

Arena profile Lumenfall catalog