Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
Wan 2.6
#23 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7 Pro
#29 of 44 in Text-to-Image
Where the votes landed
Wan 2.6
100.0%
win rate
Ties
0.0%
Wan 2.7 Pro
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Wan 2.6
- + Excellent chalk texture with realistic dusty residue and smudge marks.
- + Authentic handwriting style with natural variations in letter weight and pressure.
- + Accurately rendered all requested text with a believable cursive title.
- − The angled perspective makes reading slightly more difficult than a flat view.
Wan 2.7 Pro
- + Perfect text accuracy including completing the truncated prompt for the third item.
- + Clear, centered composition which is easy to read.
- + Consistent handwriting style across all lines of text.
- − Text looks slightly too clean and uniform, resembling a digital chalk font rather than manual handwriting.
- − The chalk texture on the letters lacks the depth and variation found in real chalk strokes.
Verdict: Wan 2.6 provides a much more authentic realization of the prompt's request for 'chalk texture' and 'natural variations,' looking like a genuine physical object. While Wan 2.7 Pro produces incredibly clean and accurate text (even completing the half-written prompt), it lacks the organic, dusty realism of the chalk strokes found in Wan 2.6.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Wan 2.6
- + Excellent photorealism with raindrops on the glass and realistic street bokeh.
- + Captures the bored expression of the passenger perfectly.
- + High level of detail on the capybara's fur and the texture of the coat.
- − The passenger is seated in the front passenger seat instead of the back seat as requested.
- − The capybara's hands look slightly more human than a capybara's natural paws.
Wan 2.7 Pro
- + Successfully places the passenger in the back seat as requested in the prompt.
- + Very clean and sharp rendering of the taxi interior.
- + Correct yellow cap and dark jacket attire as described.
- − The passenger is looking away from the phone rather than at it.
- − The lighting feels a bit more sterile and less cinematic than model A.
- − The capybara's paws Transition into primate-like fingers on the steering wheel.
Verdict: Both models followed the complex prompt well, but they struggled with different spatial elements. Wan 2.6 produced a more atmospheric and 'photorealistic' scene with rain and lighting, but failed to put the passenger in the back seat. Wan 2.7 Pro correctly placed the passenger in the back seat but missed the detail of her looking at the phone, and the overall lighting is less convincing than the previous version.
Explore each model
Alibaba's Wan 2.7 Pro image generation and editing model with higher-quality outputs and support for 4K image generation