Imagen 4.0 Ultra Generate 001 Google Z-Image Turbo Alibaba

Settled by community votes across 6 shared challenges, with an AI judge weighing in on each.

Imagen 4.0 Ultra Generate 001

22.3 arena score

#28 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Z-Image Turbo

24.7 arena score

#15 of 44 in Text-to-Image

Vote tally

Where the votes landed

Imagen 4.0 Ultra Generate 001

20.0%

win rate

Ties

10.0%

Z-Image Turbo

70.0%

win rate

20.0% 10.0% ties 70.0%

Shared challenges 6

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

50% wins 0% ties 50% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Excellent adherence to the 'partially visible through the glass' instruction for the plant.
+ Superior lighting and texture detail, particularly in the wooden grain and the book cover.
+ Very high resolution and photographic clarity.

− The blue sphere appears to be floating rather than resting on the bottom surface of the cube.
− The glass cube is depicted as a solid block of glass rather than a hollow container.

Z-Image Turbo

+ Accurately depicts the glass cube as a hollow container with the sphere resting on the bottom.
+ Good overall composition and focus.

− Fails to show the green plant through the glass of the cube, as requested.
− The lighting is flatter and the image resolution is lower than Model A.
− The book has some minor edge fraying artifacts.

Verdict: Imagen 4.0 Ultra Generate 001 produces a much more visually striking and detailed image with complex lighting and excellent prompt adherence regarding the plant's visibility through the cube. While Z-Image Turbo more logically places the sphere at the bottom of a hollow cube, it fails the specific instruction to show the plant through the glass and lacks the high-end photographic finish of the former.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Excellent adherence to the 'repairing' aspect of the prompt with realistic tools and interaction.
+ Very high skin texture detail and realistic facial features.
+ Captures the 'motion blur' of passing cars and wet pavement reflections perfectly.

− The rain drops on the jacket look a bit like static or noise rather than natural droplets.
− The red bicycle is partially cut off, though this fits the 'imperfect framing' prompt.

Z-Image Turbo

+ Good overall composition and color balance.
+ Accurately depicts the elderly Japanese man and the red bicycle in the rain.

− Fails to show the man 'repairing' the bike; he is simply holding or mounting it.
− The background car is static, ignoring the 'motion blur' request.
− Lower overall detail in skin texture and environment compared to Model A.

Verdict: Imagen 4.0 Ultra provided a much more accurate interpretation of the prompt, specifically capturing the 'repairing' action and the motion blur of passing traffic. While Z-Image Turbo produced a clean image, it missed several technical descriptors like the motion blur and the specific activity requested, resulting in a more generic scene.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

0% wins 100% ties 0% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Excellent layout following the requested sections (Appetizers, Pizza, Mains)
+ High-quality, realistic food photography in a consistent grid
+ Clean and professional minimalist aesthetic with appropriate white space

− Text becomes garbled at smaller sizes
− Some food items don't perfectly match their category (pizza images under 'Appetizers')

Z-Image Turbo

+ Stronger use of vibrant orange accent colors
+ Good text alignment for prices and menu items
+ Bold, readable header fonts

− Confusing sectioning with 'PIZZA MANS' and 'SE IIION' headers
− Layout is a bit cluttered and lacks the professional whitespace of Model A
− Text contains more obvious spelling errors in headers

Verdict: Imagen 4.0 Ultra provided a much more realistic and professional menu layout that adheres strictly to the requested category structure (Appetizers, Pizza, Mains). While Z-Image Turbo captures the 'vibrant accents' better with its orange blocks, its layout is more disorganized, and the header text is poorly rendered compared to the cleaner execution of Imagen 4.0 Ultra.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

50% wins 0% ties 50% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Perfect adherence to all text requirements including the Japan flag icon
+ High variety of sushi types on the diorama base
+ Excellent clean 3D render with soft, professional lighting

− None

Z-Image Turbo

+ Very soft, appealing 3D modeling and lighting
+ Clean text rendering for 'JAPAN' and 'SUSHI'

− Incorrect flag icon (displays the flag of China instead of Japan)
− Minimal content compared to the first image, showing only one piece of sushi

Verdict: Imagen 4.0 Ultra is the clear winner as it followed every instruction, including the correct flag for Japan. Z-Image Turbo failed a critical cultural context check by placing a Chinese flag next to the text 'JAPAN', and provided a much simpler composition.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Excellent adherence to the 'creature variety' with clearly distinct features for all four animals.
+ Strong dynamic lighting with vivid god rays and beautiful dewdrop details on the grass.
+ Very high resolution and sharp fur textures throughout the composition.

− The image has a slightly stylized, illustrative feel rather than 'hyper-photorealistic'.
− The kitten's pose and anatomy look somewhat stiff and artificial.

Z-Image Turbo

+ Achieves a higher degree of photorealism with more naturalistic fur and soft focus depth-of-field.
+ The animals feel more integrated into the environment and are physically interacting/tumbling as requested.
+ Very cute facial expressions that feel organic and less 'posed'.

− The fox's eyes appear slightly distorted or unnatural upon close inspection.
− The kittens's front paw is blending awkwardly into the puppy's fur, suggesting a clipping artifact.
− Fewer butterflies and less distinct 'god rays' compared to the other model.

Verdict: Imagen 4.0 Ultra Generate 001 provides a very vibrant and clear 8K scene with excellent lighting effects, though it leans more toward a high-end digital illustration style. Z-Image Turbo better captures the 'hyper-photorealistic' part of the prompt with more natural textures and a more convincing 'tumbling' interaction, despite some minor anatomical artifacts on the fox and kitten. Z-Image Turbo is the likely winner for better capturing the specific request for realism and the playful action described.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Imagen 4.0 Ultra Generate 001

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

+ Perfect text rendering including the grave accent in 'Caffè'
+ Sophisticated vector line-work with classic etching details
+ Excellent composition and adherence to the 'vintage minimalist' aesthetic

− The 'subtle texture' on the background is very faint, appearing almost flat

Z-Image Turbo

+ Bold, high-contrast vector style that is very legible
+ Accurate text and date rendering
+ Warm color palette well-aligned with the prompt

− The typography is a bit modern and generic compared to the requested 'classic' style
− The steam effect is slightly simplified and lacks the elegance of Model A

Verdict: Both models followed the prompt instructions accurately, including the specific text and date. Imagen 4.0 Ultra Generate 001 is the winner due to its superior typography and finer illustrative details on the cloche, which better capture the 'classic' and 'vintage' feel requested. Z-Image Turbo produced a solid logo, but it feels more like a modern interpretation rather than a truly vintage emblem.

Next steps

Explore each model

Imagen 4.0 Ultra Generate 001

Google

Google's Imagen 4.0 Ultra model offering the highest fidelity and resolution for professional-grade image generation

Vote this model in the arena

Arena profile Lumenfall catalog

Z-Image Turbo

Alibaba

Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering

Vote this model in the arena

Arena profile Lumenfall catalog