Black Forest Labs' open-weights multimodal flow transformer for in-context image generation and editing, available for non-commercial use with character consistency and style transfer capabilities
Settled by community votes across 1 shared challenge, with an AI judge weighing in on each.
FLUX.1 Kontext [dev]
#43 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Z-Image Turbo
#15 of 44 in Text-to-Image
Where the votes landed
FLUX.1 Kontext [dev]
0.0%
win rate
Ties
0.0%
Z-Image Turbo
100.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
FLUX.1 Kontext [dev]
- + Strong thematic background with coals and flames
- + Bold, readable main title text
- − Failed to provide an 'exploded' view of components
- − Spelling error in the secondary message
Z-Image Turbo
- + Excellent typography with better glowing effects
- + Higher level of textural detail in the food components
- − Failed to provide the requested 'exploded' mid-air suspension
- − Starbust design is slightly cluttered
Verdict: Both models failed the specific 'exploded' composition instruction, instead rendering standard stacked burgers. Z-Image Turbo wins due to superior text accuracy and more realistic food textures, whereas FLUX.1 Kontext [dev] had a significant spelling error and less polished typography.
Explore each model
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering