Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
Wan 2.6
#23 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7
#34 of 44 in Text-to-Image
Where the votes landed
Wan 2.6
0.0%
win rate
Ties
100.0%
Wan 2.7
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Wan 2.6
- + Excellent chalk texture with realistic smudges and dusty residue
- + Truly handwritten appearance with natural variations in letter size and slant
- + Perfect text accuracy including the clipped prompt text completion
- − The 'cursive' requirement for the title is only partially met with print/cursive hybrid letters
Wan 2.7
- + Perfect text rendering without any spelling errors
- + Clean and centered composition
- + Attractive cafe-style background lighting
- − Text looks like a digital font rather than natural chalk handwriting
- − Fails the 'no printed or digital fonts' requirement
- − Lacks the specific chalky texture and variations requested in the prompt
Verdict: Wan 2.6 is the clear winner as it successfully captured the 'handwritten' and 'chalk texture' requirements, appearing like an authentic chalkboard. Wan 2.7, despite having perfect legibility, used a clean digital-looking font that ignored the instructions for natural handwriting and chalk variations.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Wan 2.6
- + Excellent photorealism with realistic low-light noise and rain textures
- + Captures the 'bored' expression of the passenger perfectly
- + The capybara's pose and anatomy look more integrated with the driver seat
- − The capybara's paws are somewhat indistinct and blend into the steering wheel
Wan 2.7
- + Clearer depiction of the capybara's paws on the steering wheel
- + Very sharp image resolution and clean lighting
- − The capybara's fur has a slightly artificial, 'rendered' look compared to Model A
- − The passenger is positioned awkwardly close to the driver, making the backseat feel like a front seat
Verdict: Wan 2.6 is the winner because it achieves a much higher level of cinematic photorealism, particularly in the lighting and atmosphere of a New York taxi at night. While Wan 2.7 has clearer details on the paws, the spatial arrangement of the car interior is confusing, whereas the first image perfectly captures the requested bored expression and realistic depth.
Explore each model
Alibaba's Wan 2.7 image generation and editing model for text-to-image, reference-guided generation, and instruction-based image edits