Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English
Settled by community votes across 6 shared challenges, with an AI judge weighing in on each.
Wan 2.6
#23 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Z-Image Turbo
#15 of 44 in Text-to-Image
Where the votes landed
Wan 2.6
50.0%
win rate
Ties
0.0%
Z-Image Turbo
50.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
Wan 2.6
- + Excellent adherence to the grid-based food photo layout requested in the prompt.
- + Professional typography with distinct sections for Appetizers, Pizza, and Mains.
- + Very clean aesthetic with vibrant color-block accents that enhance the modern feel.
- − Gibberish text in the subtitle and item descriptions.
- − Repeats the 'Pizza' section header twice, once in the grid and once in the list.
Z-Image Turbo
- + Bold, clear sans-serif typography that is very easy to read.
- + Consistent and appetizing food photography throughout the grid.
- + Higher contrast layout that feels energetic and professional.
- − Spelling error in a major heading ('PIZZA MANS').
- − Section headers do not perfectly match the prompt requirements (Mains and SE IIIION instead of Appetizers/Pizza/Mains).
Verdict: Both models followed the prompt well, producing clean, minimalist designs with clear food grids. Wan 2.6 provided a more sophisticated layout with better colored accents and correctly identified all three requested sections, despite some garbled text. Z-Image Turbo had cleaner individual text characters and more readable prices, but the 'PIZZA MANS' typo and odd section naming make it less successful as a professional menu design.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Wan 2.6
- + Natural chalk texture with realistic smudges and dust
- + Excellent handle of the 'handwritten' request with authentic slants and strokes
- + Accurately completed the cutoff item from the prompt text
- − Repeats the price on new lines for the first two items, creating clutter
- − Text alignment is a bit messy and crowded at the bottom
Z-Image Turbo
- + Very clean and legible text layout
- + Perfect spelling for most items with a high degree of clarity
- + Consistent font style throughout
- − Handwriting looks slightly more like a digital font than natural chalk
- − Contains a spelling error: 'Mustroom' instead of 'Mushroom'
- − Lacks the authentic chalk smudging and grime seen in the other model
Verdict: Wan 2.6 captures a much more authentic and atmospheric chalk aesthetic with realistic textures and handwriting variations, although it suffers from repetitive price lines. Z-Image Turbo provides a cleaner, more readable layout but has a spelling error and the text looks a bit too much like a clean digital overlay to be truly convincing as hand-drawn chalk.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Wan 2.6
- + Excellent photorealism and depth of field
- + Superior light bokeh and texture on the car's exterior
- + The capybara's pose and expression feel more natural and professional
- − The passenger's hand holding the phone is slightly mangled
Z-Image Turbo
- + Successfully includes the seatbelt for the capybara
- + Clearer rendering of the passenger's face
- − The background is quite generic and lacks the vibrant 'Manhattan at night' energy requested
- − The lighting on the capybara is flat compared to the environment
- − The capybara's hand/paw anatomy on the steering wheel is awkward
Verdict: Wan 2.6 is the clear winner due to its superior atmosphere, lighting, and textures, which perfectly capture the 'New York at night' aesthetic. While Z-Image Turbo followed the prompt's logical details well (like the seatbelt), it lacked the professional cinematic quality and detailed background found in Wan 2.6.
Bald man challenge
Image Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
Wan 2.6
- + Successfully applied a full, thick head of hair as requested.
- + Maintained facial features and lighting with high fidelity to the original.
- + The hair texture and lighting match the environment and existing beard.
- − The hairline on the left side of the forehead looks slightly merged with the temple area.
Z-Image Turbo
- + Preserved the original facial features and background perfectly.
- − Failed the primary edit instruction by only adding thin stubble/buzz cut instead of 'full, thick head of hair'.
- − The hairline remains identical to the bald original, just with added darkening.
Verdict: Wan 2.6 followed the instructions perfectly, providing a realistic and aesthetically pleasing full head of hair that integrates well with the original person's appearance. Z-Image Turbo failed the prompt, providing only a very thin layer of stubble that does not meet the 'full, thick' requirement.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
Wan 2.6
- + Perfectly renders the requested text and flag icon
- + Follows the isometric 45° perspective accurately
- + High-quality textures for the rice and fish and a clean diorama base
- − The 'JAPAN' text is slightly off-center to the left
Z-Image Turbo
- + Pleasing soft cartoon aesthetic
- + Very clean, balanced composition
- − Displays the flag of China instead of the flag of Japan
- − Text layout is less aligned with the specific prompt instructions
- − Textures are more simple and less 'realistic PBR' than requested
Verdict: Wan 2.6 is the clear winner as it accurately followed several specific prompt details that Z-Image Turbo missed, most notably the flag of Japan. Wan 2.6 also succeeded in providing the requested refined textures and complex miniature scene, whereas Z-Image Turbo produced a generic cartoon sushi on a plate with the wrong national flag.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
Wan 2.6
- + Perfect text rendering for both the name and the banner
- + Includes the requested banner element for the year
- + Superior vintage texture on the background and logo
- − The cloche lacks the traditional handle design seen in better vector emblems
Z-Image Turbo
- + Closer to a minimalist vector icon style
- + Cleaner cloche illustration with better symmetry
- − Failed to include a banner for the 'Est. 1720' text
- − The typography on 'Caffè' is slightly inconsistent in weight and spacing
- − Less background texture than requested
Verdict: Wan 2.6 followed the prompt more accurately by including the requested banner and applying a more visible vintage texture to the background. While Z-Image Turbo has a cleaner vector aesthetic, it missed the banner element and had slightly weaker typography.
Explore each model
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering