Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding
These two have not faced off in a shared challenge yet. Here is how their skills stack up, side by side.
Stable Diffusion 3.5 Medium
#41 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.6
#23 of 44 in Text-to-Image
Stable Diffusion 3.5 Medium and Wan 2.6 have not faced off in a shared challenge yet.
The skill signature above is the honest read for now. Cast a vote in the arena to start putting them head to head.
Explore each model
Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English