Grok Imagine Image Pro xAI Z-Image Turbo Alibaba

Settled by community votes across 10 shared challenges, with an AI judge weighing in on each.

Grok Imagine Image Pro

24.8 arena score

#14 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Z-Image Turbo

24.7 arena score

#15 of 44 in Text-to-Image

Vote tally

Where the votes landed

Grok Imagine Image Pro

71.4%

win rate

Ties

0.0%

Z-Image Turbo

28.6%

win rate

71.4% 0.0% ties 28.6%

Shared challenges 10

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Grok Imagine Image Pro

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent photographic realism with high-quality textures on the wood and book.
+ Captures the interaction of light and glass very naturally.
+ The plant is large and clearly visible as requested.

− The glass refractive physics are slightly off, making the blue sphere appear to be duplicated inside.

Z-Image Turbo

+ Clean, simple composition that matches all prompt requirements.
+ Good color saturation and accurate placement of objects.
+ Realistic lighting from the left window.

− The plant in the background is very blurred, failing the 'partially visible through the glass' detail compared to Model A.
− Object textures are slightly softer and less detailed than the competitor.

Verdict: Both models followed the prompt perfectly in terms of object placement. Grok Imagine Image Pro is the winner due to its superior texture quality and the way it successfully rendered the plant being visible through the glass cube, whereas Z-Image Turbo blurred the background excessively.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Grok Imagine Image Pro

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent adherence to the 'repairing' action with realistic hand placement and a tool.
+ Strong cinematic composition with beautiful light reflections and background bokeh.
+ Perfect interpretation of the 'motion blur from passing cars' request.

− The wrench is slightly misshapen and doesn't logically grip the bolt.
− The bicycle is missing its front wheel despite the kickstand being down.

Z-Image Turbo

+ Good realistic skin texture and natural lighting.
+ Captures the light rain effect and wet pavement well.
+ Functional bicycle anatomy.

− Failed the 'repairing' prompt; the man is simply holding or mounting the bike.
− The car in the background lacks the requested motion blur.
− The framing is very tight and lacks the cinematic depth of the other model.

Verdict: Grok Imagine Image Pro produced a much more cinematic and atmospheric image that strictly followed the complex prompt requirements, including the specific 'repairing' action and 'motion blur' on cars. While it suffered from a logical error (missing front wheel), Z-Image Turbo failed to capture the central action of the prompt, making the man appear to be just standing with the bike rather than repairing it.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

Grok Imagine Image Pro

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent rendering of engraved armor with legible Latin text
+ Superior skin texture with realistic pores, scars, and grime
+ Very clear and detailed beads within the braided hair

− The sparks look a bit like uniform points of light rather than organic embers

Z-Image Turbo

+ Strong atmospheric lighting from an actual torch source
+ Good use of depth of field and bokeh
+ Effective portrayal of braided hair and intricate armor engravings

− Skin texture appears smoother and less detailed than Model A
− The torch flame has a slightly 'pasted on' digital look

Verdict: Grok Imagine Image Pro produces a much more technically impressive portrait with incredible surface detail on the skin, armor, and leather straps, perfectly capturing the 'battle-worn' aesthetic. Z-Image Turbo provides a more cinematic composition with the inclusion of the torch, but it falls short of the sharpness and fine-grained texture found in Grok's output.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Grok Imagine Image Pro

Z-Image Turbo

50% wins 0% ties 50% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Perfect text rendering for all section headers.
+ Highly consistent high-quality food photography with a unified lighting style.
+ Clean, professional grid layout that strictly follows the modern minimalist aesthetic.

− Lacks specific menu item names and prices usually found in a design layout.
− The 'vibrant accents' are limited to simple colored underlines.

Z-Image Turbo

+ Includes more realistic menu elements like prices and item lists.
+ Dynamic use of layout with orange color blocking for a casual dining feel.

− Significant text errors including 'PIZZA MANS' and 'SE IIION'.
− Inconsistent food photography and a cluttered bottom section.
− Fail to align the photos correctly with the labeled sections (e.g., pizza is shown under Appetizers).

Verdict: Grok Imagine Image Pro produced a much cleaner and more professional design that accurately reflects the 'minimalist' prompt with perfect text rendering. While Z-Image Turbo attempted a more complex layout with prices and item names, it failed due to significant spelling errors, a lack of alignment between categories and photos, and a cluttered overall appearance.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

Grok Imagine Image Pro

Z-Image Turbo

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent chalk texture with realistic dusty smudges and varying stroke witness
+ Flawless spelling and complete adherence to the truncated prompt text
+ Highly realistic chalkboard and wooden frame appearance

− The title is in all-caps rather than the requested 'elegant cursive'

Z-Image Turbo

+ Natural-looking chalk handwriting style
+ Good layout and spacing on the board

− Spelling error in 'Mustroom' instead of Mushroom
− The date line has uneven formatting with a large leading hyphen
− Handwriting looks slightly more like a digital font mimic than organic chalk

Verdict: Grok Imagine Image Pro is the clear winner for its superior realism and perfect spelling; the chalk texture feels authentic to a physical board. While both models missed the 'elegant cursive' instruction for the title, Z-Image Turbo failed on basic spelling and produced a less convincing chalk effect.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

Grok Imagine Image Pro

Z-Image Turbo

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent text rendering on the hat with contextually accurate 'TLC Medallion' details.
+ Great composition that captures both the driver and the passenger as requested.
+ High level of realism in the textures of the capybara's fur and the passenger's coat.

− The capybara's paws look slightly more like human hands/claws rather than natural paws on a steering wheel.

Z-Image Turbo

+ Features a more traditional livery-style driver's cap.
+ Good lighting on the capybara's face.

− The passenger is placed in the front passenger seat instead of the back seat.
− The background city lights are very generic and don't clearly evoke 'Manhattan'.
− The capybara's paws are poorly rendered and do not appear to be actually holding the wheel.

Verdict: Grok Imagine Image Pro followed the prompt much more accurately, correctly placing the passenger in the back seat and providing a convincing Manhattan backdrop. Z-Image Turbo failed on the spatial arrangement of characters and had weaker detail in the paws and background. Grok's touch with the specific 'NYC TLC' text on the hat adds a layer of realism and attention to detail that makes it the clear winner.

Bald man challenge

Image Editing

Edit instruction

“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”

Grok Imagine Image Pro

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent source preservation, keeping the face, glasses, and background virtually untouched.
+ Highly realistic hair texture and natural integration with the original lighting.
+ Perfectly follows the instruction for a 'full, thick head of hair'.

− None identified.

Z-Image Turbo

+ Successfully added a short buzz cut texture.

− Failed to provide 'full, thick' hair as requested, essentially keeping the person bald.
− Removed the subject's glasses.
− Significantly altered facial features and changed the background environment.

Verdict: Grok Imagine Image Pro performed a near-flawless edit, seamlessly adding a realistic full head of hair while preserving every other detail of the source image perfectly. In contrast, Z-Image Turbo failed most aspects of the prompt: it did not provide thick hair, removed the subject's glasses, and altered both the face and background.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Grok Imagine Image Pro

Z-Image Turbo

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Accurately renders the Japanese flag icon as requested.
+ Features high-quality, realistic PBR materials and textures on the sushi and wooden base.
+ Comprehensive interpretation of the scene with multiple sushi types and excellent centering.

− The camera angle is slightly lower than the requested 45-degree isometric view.

Z-Image Turbo

+ Perfectly captures the isometric diorama style requested.
+ Clean, minimalist composition with bold, readable text.

− Incorrectly uses the Chinese flag instead of the Japanese flag.
− Texture on the fish looks a bit plasticky compared to Model A.
− The green garnish is strangely embedded inside the rice roll.

Verdict: Grok Imagine Image Pro is the superior choice because it correctly followed the cultural context of the prompt, including the Japanese flag, whereas Z-Image Turbo displayed the Chinese flag. Additionally, Grok Imagine Image Pro showcased much better material rendering with varied textures and realistic lighting that truly feels like a 3D miniature.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Grok Imagine Image Pro

Z-Image Turbo

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image Pro

+ Excellent adherence to all subject counts and types requested.
+ Beautiful lighting with clear god rays and dew sparkles as described.
+ Highly detailed meadow with diverse flora.

− Generated two kittens instead of one requested tabby kitten.
− The foxes anatomy, particularly the paws, looks slightly distorted.

Z-Image Turbo

+ Soft, pleasing bokeh and colors that suit a wholesome vibe.
+ Good fur textures and expressive eyes on the animals.
+ Correctly features one of each animal type.

− The kitten is missing its body, appearing as just a head behind the bunny.
− The puppy's front paw is blending awkwardly into the bunny's back.
− Missing the 'god rays' aspect of the lighting prompt.

Verdict: Grok Imagine Image Pro produced a much more complete and technically sound composition with beautiful lighting effects, despite adding an extra kitten. Z-Image Turbo struggled with spatial coherence, resulting in a floating kitten head and merged limbs between the puppy and bunny.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Grok Imagine Image Pro

Z-Image Turbo

AI Judge Analysis

Grok Imagine Image Pro

+ Accurate text rendering for both 'Caffè Florian' and 'Est. 1720'.
+ Clean circular emblem composition with a nice subtle paper texture.
+ Strong vector aesthetic with consistent line weighting.

− The silver/grey cloche feels slightly out of place in a 'warm brown and cream' color palette.
− The steam icon is a bit thick and looks like a single comma.

Z-Image Turbo

+ Excellent adherence to the 'warm brown and cream' color palette.
+ Superior integration of the 'Est. 1720' banner as requested in the prompt.
+ Elegant typography that fits the vintage minimalist aesthetic perfectly.

− Slightly less 'subtle' texture on the background than Model A.
− The cloche illustration is a bit more simplified/abstract compared to a traditional dome.

Verdict: Z-Image Turbo is the winner because it adhered more closely to the specific prompt elements, most notably the 'Est. 1720 banner' and the warm color scheme. While Grok Imagine Image Pro produced a very clean emblem, its cloche was grey rather than brown-toned, and it missed the explicit 'banner' layout for the date.

Next steps

Explore each model

Grok Imagine Image Pro

xAI

xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model

Vote this model in the arena

Arena profile Lumenfall catalog

Z-Image Turbo

Alibaba

Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering

Vote this model in the arena

Arena profile Lumenfall catalog