FLUX.2 [max] Black Forest Labs Z-Image Turbo Alibaba

Settled by community votes across 9 shared challenges, with an AI judge weighing in on each.

FLUX.2 [max]

25.9 arena score

#11 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Z-Image Turbo

24.7 arena score

#15 of 44 in Text-to-Image

Vote tally

Where the votes landed

FLUX.2 [max]

81.8%

win rate

Ties

0.0%

Z-Image Turbo

18.2%

win rate

81.8% 0.0% ties 18.2%

Shared challenges 9

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

FLUX.2 [max]

Z-Image Turbo

AI Judge Analysis

FLUX.2 [max]

+ Excellent photographic quality with realistic textures and lighting.
+ Strong adherence to all spatial requirements, including plant visibility through glass.
+ Sophisticated lighting showing soft window light and caustic reflections.

− The plant is slightly less 'behind' the cube compared to Model B, though still partially visible through it.

Z-Image Turbo

+ Good adherence to the basic prompt elements.
+ Clean composition with a natural-looking wooden table.

− Lower overall resolution and clarity compared to Model A.
− The plant is almost entirely blurred out, losing the 'partially visible through the glass' effect.
− Lighting is flat compared to the soft directional light requested.

Verdict: FLUX.2 [max] significantly outperforms Z-Image Turbo in terms of visual fidelity, lighting complexity, and material textures. While both models followed the prompt's spatial instructions well, FLUX.2 [max] created a much more convincing scene with realistic glass reflections and a higher-quality render of the book and plant.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

FLUX.2 [max]

Z-Image Turbo

67% wins 0% ties 33% wins

AI Judge Analysis

FLUX.2 [max]

+ Excellent adherence to technical prompts like shallow depth of field and motion blur.
+ Highly realistic skin textures and fine details on the jacket and bicycle.
+ Atmospheric lighting and reflections that create a cinematic feel.

− Minor anatomical distortion where the hand and the brake/wire of the bike merge.

Z-Image Turbo

+ Clear, well-lit subject and clean bicycle geometry.
+ Shows the full body of the subject in a natural pose.

− Failed to produce the requested 'shallow depth of field' and 'motion blur'.
− Lacks the 'cinematic' and 'no stylization' quality, appearing more like a standard digital snapshot.
− The rain effect is barely visible and the pavement reflections are underwhelming.

Verdict: FLUX.2 [max] followed the prompt instructions much more accurately, successfully incorporating complex photography elements like shallow depth of field, motion blur on passing cars, and detailed skin textures. Z-Image Turbo produced a generic, sharp image that ignored most of the atmospheric and technical camera requirements. FLUX.2 [max] is the clear winner for its superior realism and mood.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

FLUX.2 [max]

Z-Image Turbo

75% wins 0% ties 25% wins

AI Judge Analysis

FLUX.2 [max]

+ Perfect adherence to section headers (Appetizers, Pizza, Mains)
+ High-quality, distinct food photos in a clean grid
+ Excellent text rendering for main titles and price columns

− Small body text is mostly gibberish
− The 'Appetizers' section contains photos of pizzas and burgers

Z-Image Turbo

+ Modern, high-contrast block layout
+ Higher quality individual food photography
+ Clean minimalist orange and black color scheme

− Failed to include 'Mains' as a separate section header, merging it into 'Pizza Mans'
− Incorrect text spelling in headers ('SE TIIION')
− Layout is a bit cluttered with large blocks overlapping food photos

Verdict: FLUX.2 [max] followed the prompt more accurately by providing all three requested sections (Appetizers, Pizza, Mains) with highly legible headers and a professional vertical layout. Z-Image Turbo produced more striking food photography but suffered from significant spelling errors and failed to properly categorize the menu sections as requested.

Magic Burger Explosion: Fiery Photorealism Challenge

Text-to-Image

“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”

FLUX.2 [max]

Z-Image Turbo

AI Judge Analysis

FLUX.2 [max]

+ Excellent adherence to the 'exploded' layout with clearly separated components.
+ Exceptional text rendering with several distinct fonts and effects.
+ High photorealistic detail on the textures of the meat and bread.

− The starburst for the price looks a bit like a flat graphic compared to the 3D scene.

Z-Image Turbo

+ Strong fiery atmosphere and lighting integration.
+ The price starburst is beautifully rendered with a 3D fiery effect.
+ Good edible appeal with glossy sauce highlights.

− Failed the 'exploded' instruction as ingredients are still mostly stacked.
− Includes repetitive text ('MAGIC BURGER BURGER').
− Lower resolution/clarity on the text 'LIMITED TIME ONLY'.

Verdict: FLUX.2 [max] followed the complex layout instructions much better, delivering a true 'exploded' view with all text elements perfectly rendered and positioned. Z-Image Turbo created a more cohesive fiery atmosphere and a better price starburst, but it failed the core structural prompt of separating the components and introduced redundant text.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

FLUX.2 [max]

Z-Image Turbo

AI Judge Analysis

FLUX.2 [max]

+ Excellent photorealism in the textures of the leather jacket and car interior.
+ Precise adherence to the 'both front paws on the steering wheel' instruction.
+ Natural-looking lighting and depth of field that feels like a real photograph.

− The capybara is wearing gloves, which was not requested and masks its paws.
− The proportions of the capybara's body are slightly anthro-morphed/elongated to fit the seat.

Z-Image Turbo

+ Great character design for the capybara with a very professional expression.
+ Excellent background bokeh and lighting that captures the New York night atmosphere.

− Fails the steering wheel instruction, with one paw floating in the air.
− Anatomy of the paws/hands looks slightly distorted and less 'capybara-like'.
− The business woman in the back looks slightly like a mannequin rather than a real person.

Verdict: FLUX.2 [max] is the winner as it fully adheres to the physical positioning requested in the prompt, placing both paws on the wheel, and features superior textures in the car interior. Z-Image Turbo produces a more front-facing, charming character, but fails the specific steering wheel instruction and has less realistic lighting.

Bald man challenge

Image Editing

Edit instruction

“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”

FLUX.2 [max]

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

FLUX.2 [max]

+ Successfully added a full head of hair with realistic texture.
+ Excellent preservation of the jacket, background, and original person's features.
+ Matches the lighting of the scene perfectly.

− Unnecessarily modified the beard color to include white/grey patches not present in the original.

Z-Image Turbo

+ Maintains the original skin tone and facial structure well.

− Completely failed the main edit instruction (remains bald).
− Significantly altered the background environment from desert scrub to dry grass.
− Modified the clothing details despite the prompt to preserve features.

Verdict: FLUX.2 [max] successfully executed the requested edit, providing a realistic head of hair while keeping the background and person's likeness intact. Z-Image Turbo failed to add any hair and also altered the background of the image, which was not requested.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

FLUX.2 [max]

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

FLUX.2 [max]

+ Perfectly followed the flag requirement by including the Japanese flag.
+ Excellent miniature diorama construction with multiple levels and realistic textures.
+ Superior text rendering and alignment.

− The camera angle is slightly lower than a true 45-degree isometric view.

Z-Image Turbo

+ Very clean, soft textures that match the '3D cartoon' style requested.
+ Good centered composition with a clear focus on the main subject.

− Incorrectly used the flag of China instead of Japan.
− Perspective of the plate feels slightly warped compared to the base.

Verdict: FLUX.2 [max] followed every part of the prompt, including the correct flag and a well-structured multi-level diorama. Z-Image Turbo produced a high-quality stylized image but failed the logical check by placing a Chinese flag next to the text for Japan.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

FLUX.2 [max]

Z-Image Turbo

AI Judge Analysis

FLUX.2 [max]

+ Excellent prompt adherence with all four animals clearly defined and interacting with butterflies.
+ Superior lighting with realistic god rays and dew sparkles that create a high-end cinematic feel.
+ Higher level of detail in the fur texture and the variety of wildflowers in the meadow.

− The fox's anatomy is a bit stiff in its leaping pose.
− The kitten's size is slightly small relative to the bunny.

Z-Image Turbo

+ Very cute, expressive facial expressions on the puppy and kitten.
+ Bright, vibrant colors that reinforce the 'joyful wholesome vibe'.
+ Clean composition with a clear focal point.

− The puppy's paw appears to be merging strangely with the bunny's back.
− Lower overall resolution and less 'hyper-photorealistic' than requested, feeling more like a digital illustration.
− Missing some of the atmospheric details like the dew sparkles and distinct god rays seen in the other version.

Verdict: FLUX.2 [max] is the clear winner as it successfully captured the hyper-photorealistic requirement with sophisticated lighting and complex environmental details like dew and god rays. Z-Image Turbo produced a very cute image, but it struggled with anatomical merging (puppy paw and bunny) and lacked the technical '8K masterpiece' finish that FLUX.2 [max] achieved.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

FLUX.2 [max]

Z-Image Turbo

AI Judge Analysis

FLUX.2 [max]

+ Perfect adherence to typography and punctuation with the grave accent in 'Caffè'
+ Excellent emblem composition including the requested banner and circular frame
+ High-quality subtle paper texture on the background

− The 'steam' lines are a bit thin and faint

Z-Image Turbo

+ Strong minimalist vector style with high contrast
+ Correct spelling and accentuation of the text
+ Clear cloche iconography

− The layout is less integrated as an 'emblem' compared to Model A
− The background lacks the 'subtle texture' requested
− The steam effect is very abstract and minimal

Verdict: FLUX.2 [max] produced a much more sophisticated and professional-looking logo that captures the 'vintage' and 'emblem' aspects of the prompt perfectly. Z-Image Turbo followed the instructions well, but the final result feels like a basic clip-art arrangement compared to the cohesive design of FLUX.2 [max].

Next steps

Explore each model

FLUX.2 [max]

Black Forest Labs

Black Forest Labs' flagship image generation model delivering state-of-the-art quality with exceptional realism, precision, and consistency for both text-to-image and advanced image editing

Vote this model in the arena

Arena profile Lumenfall catalog

Z-Image Turbo

Alibaba

Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering

Vote this model in the arena

Arena profile Lumenfall catalog