Improved version of Alibaba's Qwen image model with better text rendering, finer natural textures, and more realistic human generation.
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
Qwen Image 2512
#26 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7
#34 of 44 in Text-to-Image
Where the votes landed
Qwen Image 2512
0%
win rate
Ties
0%
Wan 2.7
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Qwen Image 2512
- + Excellent chalk texture with realistic smudges and dust on the board.
- + Authentic handwritten variation in cursive style that looks truly manual.
- + Correct interpretation of a framed tabletop chalkboard in a cafe setting.
- − Spelling error in 'Risotto' (spelled 'Risitto').
Wan 2.7
- + Perfect spelling for all requested menu items.
- + Very clear and legible text alignment.
- + Good lighting and background environment detail.
- − The text looks more like a digital font or vector asset with a drop shadow rather than genuine chalk on a board.
- − The 'handwriting' is too uniform, lacking the natural variation requested in the prompt.
Verdict: Qwen Image 2512 produces a much more authentic 'handwritten' chalkboard look with realistic textures and manual flourishes, despite a minor spelling error. Wan 2.7 has perfect spelling but fails the stylistic requirement, as the text appears as a clean digital overlay rather than realistic chalk.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Qwen Image 2512
- + Excellent adherence to the 'bored' facial expression on the human passenger.
- + Correct positioning of the human in the back seat as per the prompt.
- + Highly detailed fur texture and realistic reflection on the windshield.
- − The paws on the steering wheel look slightly anthropomorphized and claw-like.
- − The camera perspective through the windshield is slightly crowded.
Wan 2.7
- + Natural profile lighting on the capybara's face.
- + Good anatomy on the capybara's paws holding the wheel.
- + Clear view of the Manhattan street environment.
- − Failed to place the human in the back seat; she is in the front passenger seat.
- − The capybara's fur has a slightly more artificial, repetitive texture.
- − The passenger is looking at the phone but is sitting next to the driver, changing the requested dynamic.
Verdict: Qwen Image 2512 followed the prompt's spatial instructions much better, correctly placing the human passenger in the back seat with a distinct 'bored' expression. Wan 2.7 generated a high-quality image but failed on the layout by placing the passenger in the front seat, which misses the intended 'taxi ride' narrative.
Explore each model
Alibaba's Wan 2.7 image generation and editing model for text-to-image, reference-guided generation, and instruction-based image edits