FLUX.2 [dev] vs Z-Image Turbo
Head-to-head across 5 challenges
FLUX.2 [dev]
33.3%
win rate
Ties
33.3%
Z-Image Turbo
33.3%
win rate
Challenge Results
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
FLUX.2 [dev]
- + Excellent rendering of thick, realistic glass with appropriate refractive qualities.
- + The plant is clearly visible through the glass as requested.
- + Cinematic lighting that accurately reflects the soft window light from the left.
- − The blue sphere appears to be floating slightly above the bottom surface.
- − The glass cube has rounded interior corners that make it look more like a vase or container than a geometric cube.
Z-Image Turbo
- + Sharp, precise geometric cube shapes with clean edges.
- + Realistic texture on the red book, including wear and paper detail.
- + The blue sphere is correctly seated on the bottom surface with a reflection.
- − The plant in the background is very blurry and barely visible through the glass cube itself.
- − The lighting is somewhat flat compared to the atmosphere in Model A.
Verdict: FLUX.2 [dev] produces a more atmospheric and aesthetically pleasing image with superior glass physics and better adherence to the requirement of seeing the plant through the glass. Z-Image Turbo captures the geometric 'cube' shape more accurately and has better book textures, but fails to make the plant significantly visible through the glass medium.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
FLUX.2 [dev]
- + Excellent adherence to the 'motion blur from passing cars' prompt element.
- + Very realistic skin texture and facial features for the elderly man.
- + Superior lighting and reflections on the wet pavement creating a cinematic atmosphere.
- + Highly detailed and realistic bicycle components.
- − The bicycle frame geometry becomes slightly nonsensical near the bottom bracket.
- − The man's hands are interacting with a complex mess of cables that looks a bit cluttered.
Z-Image Turbo
- + Clearer view of the 'red bicycle' as requested.
- + Good depiction of light rain falling against the background cars.
- − Failed to include motion blur for the passing cars, which remain static.
- − The skin texture and lighting look flatter and less cinematic than requested.
- − The man appears to be just holding the bike rather than repairing it.
Verdict: FLUX.2 [dev] followed the complex prompt requirements much more effectively, specifically capturing the motion blur of traffic and the cinematic wet-weather atmosphere. While Z-Image Turbo produced a clear image, it missed the key stylistic instruction for motion blur and the subject appears to be posing with the bike rather than repairing it.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
FLUX.2 [dev]
- + Excellent execution of the braided hair with beads as requested.
- + Highly detailed texture on leather straps and metal engravings.
- + Very lifelike eyes with realistic skin texture and scars.
- − The torch in the background is slightly blurry and less defined than in Model B.
Z-Image Turbo
- + Strong atmospheric lighting with a well-defined torch and bokeh sparks.
- + Good representation of ornate plate armor and underlayers.
- + Effective use of shallow depth of field for a cinematic look.
- − The beads in the hair look more like metallic studs or sequins rather than traditional beads.
- − The skin texture and scar details are slightly softer compared to Model A.
Verdict: FLUX.2 [dev] followed the prompt more precisely, particularly regarding the hair braids and beads, and delivered superior texture detail on the leather and skin. Z-Image Turbo produced a beautiful cinematic image with high-quality armor, but it felt slightly more generic in its facial details and interpretation of the beads. FLUX.2 [dev] is the winner for its impressive photorealism and adherence to the fine details of the request.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
FLUX.2 [dev]
- + Excellent anatomical accuracy for all animals.
- + Superior rendering of 'god rays' and atmospheric lighting.
- + Highly detailed fur texture and realistic morning dew sparkles.
- − The animals are sitting rather than 'tumbling' as requested in the prompt.
- − Included an extra rabbit.
Z-Image Turbo
- + Captures the 'tumbling' and 'playful' motion much better than Model A.
- + Bright, vibrant colors that fit the 'joyful wholesome vibe'.
- + Good adherence to the types of animals requested.
- − Noticeable anatomical issues, such as the puppy's paw merging into the rabbit.
- − The cat's facial structure and open mouth look slightly distorted and unnatural.
- − Lower overall resolution and fine detail compared to the competitor.
Verdict: FLUX.2 [dev] produces a much more technically proficient and realistic image with beautiful lighting and textures, though it is more static in composition. Z-Image Turbo better captures the requested 'tumbling' action, but suffers from significant anatomical merging and less refined details in the fur and faces. FLUX.2 [dev] is the winner for its superior visual quality and realism.
Heroic Super Hero Portrait
Text-to-Image“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”
AI Judge Analysis
FLUX.2 [dev]
- + Excellent full-body composition with a true sense of scale.
- + Impressive cityscape detail and lighting integration.
- + Great costume texture and dramatic cape flow.
- − The fingers on the hands are slightly warped and indistinct.
- − The skyscraper edge she is standing on looks a bit jagged and unnatural.
Z-Image Turbo
- + Realistic facial features and very natural skin and hair rendering.
- + The lighting on the rooftop surface is very convincing.
- + Clean hand anatomy and glove detail.
- − The cityscape in the background is overly blurred and lacks the requested detail.
- − The inclusion of a skirt makes the costume slightly less the 'classic' sleek design implied by the rest of the prompt.
- − The 'S' emblem is slightly asymmetrical.
Verdict: FLUX.2 [dev] delivers a much more cinematic and detailed cityscape that captures the 'triumphant' scale requested, though it suffers from minor anatomical issues in the hands. Z-Image Turbo has superior skin and facial realism, but fails to provide the detailed urban background specified in the prompt, resulting in a shallower, less impressive environment.
FLUX.2 [dev]
Black Forest Labs' open-weights image generation model with frontier performance, available for non-commercial local deployment
Z-Image Turbo
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering