Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English

Wan 2.6 Benchmarks

Wan 2.6 ranks 23rd in global text-to-image performance with a 1214 Elo, while achieving a significantly higher 7th place ranking in image editing with a 1220 Elo. The model demonstrates competitive stability across the Wan AI suite for both English and Chinese prompt instructions.

Lumenfall Arena
#4
Text-to-Video
1127 Elo

Text-to-Video Landscape

Competition Results

Uncategorized

Text-to-Video
Prompt

“A medium shot of Will Smith sitting at a cozy, dimly lit Italian restaurant, twirling a forkful of spaghetti with a playful grin on his face. The camera slowly dollies in as he takes a bite, capturing the rich, steaming sauce. Warm candlelight, cinematic lighting, hyperrealistic, 4K. Smooth motion, consistent face, detailed hands, natural spaghetti physics.”

Help rank Wan 2.6 Pick the better image in blind matchups. Results update rankings in real time.
Start Voting