Grok Imagine Image vs Z-Image Turbo
Head-to-head across 11 challenges
Grok Imagine Image
51.6%
win rate
Ties
3.2%
Z-Image Turbo
45.2%
win rate
Challenge Results
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
Grok Imagine Image
- + Excellent photographic quality and lighting
- + Realistic glass refractions and plant visibility
- + Accurate spatial arrangement of all components
- − The sphere is floating in the center rather than resting on the bottom
Z-Image Turbo
- + The sphere is logically resting on the cube's base
- + Clean and simple composition
- − The plant is extremely blurry/out of focus and barely recognizable
- − The glass cube has an illogical mirrored/opaque bottom
- − Lower overall resolution and detail compared to the other model
Verdict: Grok Imagine Image produced a much more visually compelling and detailed image with sophisticated lighting and realistic glass refractions. While the sphere is floating in Grok's version, Z-Image Turbo struggled with the 'plant' element and the material physics of the cube, making the bottom look like a mirror rather than clear glass.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
Grok Imagine Image
- + Excellent adherence to technical prompts like motion blur, 50mm shallow depth of field, and cinematic lighting.
- + High level of realism with 'imperfect' framing that feels like a true candid photograph.
- + Strong atmosphere with convincing wet pavement reflections and rain effects.
- − The man's face is obscured and his hands are slightly blurred, making it harder to judge skin texture.
- − The mask on the man feels like an incidental detail that may or may not be desired in a 'cinematic' shot.
Z-Image Turbo
- + Clearer depiction of the subject's face and natural skin texture.
- + Good representation of the Japanese elderly demographic.
- − Failed to include the requested motion blur on passing cars.
- − The lighting and composition feel flat and less 'cinematic' compared to Model A.
- − Vertical lines on the cars and background appear as rain artifacts rather than natural falling rain.
Verdict: Grok Imagine Image better captured the specific technical requirements of the prompt, including motion blur, shallow depth of field, and 'imperfect' framing, resulting in a much more cinematic and atmospheric image. Z-Image Turbo produced a clear but relatively generic photo that ignored the motion blur and lighting style requested. Grok Imagine Image is the winner for its superior composition and adherence to the filmic aesthetic.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
Grok Imagine Image
- + Exceptional engraving detail on the plate armor
- + Follows the request for a close portrait with a dramatic composition
- + Beautifully rendered bokeh sparks and lighting
- − The hair braids merge unnaturally into the armor at the bottom
- − Scars look a bit like digital paint strokes rather than skin texture
Z-Image Turbo
- + More realistic 'battle-worn' appearance with bruised and textured skin
- + Natural integration of hair beads and braided strands
- + Excellent representation of the cloth and chainmail underlayers
- − The torch in the foreground feels slightly disconnected from the hand/shoulder area
- − Less ornate armor engraving compared to Model A
Verdict: Grok Imagine provides a more striking, high-contrast portrait with incredible detail in the armor engravings, while Z-Image Turbo excels at capturing a more grounded, 'battle-worn' look with realistic skin texture and complex layering of clothing and beads. Grok Imagine is the winner for its superior visual polish and adherence to the 'close portrait' framing.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
Grok Imagine Image
- + Excellent adherence to the 'sections' part of the prompt with clear Appetizers, Pizza, and Mains headings.
- + The layout feels more like a functional, professional restaurant menu.
- + Features high-quality, vibrant food photography integrated well within the text flow.
- − Contains significant text repetition (e.g., 'Grilled Salmon' and 'Steak Frites' listed multiple times).
- − Small body text is mostly illegible gibberish.
Z-Image Turbo
- + Perfectly executes the 'grid' requirement for food photos mentioned in the prompt.
- + Includes price points, which adds to the realism of a menu layout.
- + Clean, minimalist use of bold orange accents and balanced whitespace.
- − Fails to include all requested sections, merging 'Pizza' and 'Mains' into a garbled 'PIZZA MANS' heading.
- − General text rendering for menu items is very poor and looks like random character strings.
Verdict: Grok Imagine produces a more cohesive and professional-looking menu layout that includes all requested sections (Appetizers, Pizza, and Mains), though it suffers from item repetition. Z-Image Turbo captures the requested 'grid' aesthetic perfectly but fails on the semantic structure of the menu headings, resulting in nonsensical labels like 'PIZZA MANS'. Grok Imagine is preferred because its design is more practical and aligns closer to the layout expectations of a real restaurant.
Bald man challenge
Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
Grok Imagine Image
- + Perfectly preserves facial features, clothing, and background from the source image
- + Successfully adds a full head of hair that looks natural
- + Maintains consistent lighting and color temperature
- − The hairline transition on the forehead is a slightly sharp edge upon very close inspection
Z-Image Turbo
- + Maintains the general aesthetic of the original scene
- − Fails to follow the core instruction of adding a full, thick head of hair
- − Significantly alters the subject's face, making them look like a different person
- − Removes the subject's glasses and changes the background/landscape
Verdict: Grok Imagine performed a near-perfect edit, successfully adding the requested hair while keeping every other detail of the source image identical. Z-Image Turbo failed the prompt by only adding stubble rather than a full head of hair, while also unnecessarily changing the person's facial structure, removing their glasses, and altering the background.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
Grok Imagine Image
- + Perfectly followed text instructions for 'JAPAN' and 'SUSHI' with excellent typography.
- + Correctly identified and rendered the Japanese flag icon.
- + Created a detailed and varied spread of sushi that fits the 3D miniature theme well.
- − The shadows are slightly harsh for a 'gentle lighting' request.
- − The perspective, while isometric, is a bit steep compared to the request.
Z-Image Turbo
- + Excellent soft, refined textures and 'gentle lighting' as requested.
- + Good use of PBR-style materials on the salmon topping.
- + Very clean diorama base execution.
- − Included an incorrect flag icon (Chinese flag instead of Japanese flag).
- − Text layout is off-center and the font is less professional than the competition's.
Verdict: Grok Imagine Image adhered much better to the specific details of the prompt, particularly the text alignment and the correct flag icon for Japan. While Z-Image Turbo produced very high-quality soft textures and lighting, the inclusion of the Chinese flag for a Japanese-themed prompt is a significant factual error.
Night Sky Transformation
Editing“Change the scene to night: a deep, dark sky with subtle, glistening stars visible behind the mountain.”
AI Judge Analysis
Grok Imagine Image
- + Perfectly adheres to the night scene request with a deep dark sky and stars.
- + Exceptional source preservation, keeping the geological structure and village layout nearly identical.
- + Accurately adjusts the lighting on the mountain peaks and town to reflect a nighttime atmosphere.
- − None notable; the edit is highly effective.
Z-Image Turbo
- + Maintains high resolution and visual clarity.
- − Failed to change the scene to night, retaining the sunset/dawn lighting of the original.
- − The mountain peak and sky are fundamentally altered in shape and texture compared to the source.
- − Requested 'glistening stars' are entirely absent.
Verdict: Grok Imagine Image followed all instructions perfectly, transforming the sunset scene into a convincing night image while preserving the exact layout and structure of the source photo. Z-Image Turbo essentially ignored the core instruction to change the time of day, delivering another sunset image that significantly altered the shape of the Matterhorn mountain peak.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
Grok Imagine Image
- + Strong implementation of god rays and sunrise lighting
- + High level of detail in the surrounding wildflower meadow
- + Consistent soft focus/bokeh effect
- − The animals look more like airbrushed plushies than photorealistic creatures
- − Static composition where the animals are sitting still rather than 'chasing and tumbling'
- − The fox and rabbit have slightly uncanny, oversized features
Z-Image Turbo
- + Excellent depiction of motion and interaction ('tumbling together')
- + Photorealistic fur texture and lifelike animal expressions
- + Clearly visible butterflies and dew sparkles as requested
- − The puppy's front paw has some anatomical blending issues with the rabbit
- − Background lighting is a bit washed out compared to the dramatic rays in Model A
Verdict: While Grok Imagine Image has more dramatic lighting and 'God rays', Z-Image Turbo much better captures the spirit of the prompt's action, showing the animals actively playing and tumbling. Z-Image Turbo also achieves a significantly higher degree of photorealism in the animals' fur and anatomy, whereas Grok Imagine Image looks more like a stylized digital illustration.
Victorian Greenhouse Oasis
Text-to-Image“Hyper-photorealistic interior of a lush Victorian glass greenhouse filled with exotic tropical plants, vibrant blooming orchids, tall ferns, colorful butterflies in flight, sunlight filtering through ornate glass roof creating realistic caustics and dew on leaves, intricate iron framework visible, misty atmosphere, 8K masterpiece.”
AI Judge Analysis
Grok Imagine Image
- + Excellent architectural symmetry and intricate Victorian ironwork details.
- + Beautiful dramatic lighting with god rays and hazy atmospheric perspective.
- + Rich, saturated colors and a wide variety of botanical life.
- − Butterflies feel like flat overlays and lack coherent spatial placement.
- − Minimal dew effects visible compared to the request.
- − Some light sources from the windows appear physically inconsistent.
Z-Image Turbo
- + Superior realism in plant textures, especially the dew drops on the leaves.
- + Butterflies feel better integrated into the 3D space.
- + Accurate representation of diverse orchids and tall tropical ferns as requested.
- − The composition feels a bit crowded and chaotic.
- − The mist effect is concentrated in a single central blob rather than being diffused.
- − The glass ceiling details are slightly less crisp than Model A.
Verdict: While Grok Imagine produced a more grand and symmetrical architectural piece with better lighting, Z-Image Turbo captured the specific details of the prompt more effectively, particularly the 'dew on leaves' and the variety of 'tall ferns' and 'blooming orchids.' Grok Imagine's butterflies look like stickers placed on top of a background, whereas Z-Image Turbo achieves a higher level of photorealism in its vegetation.
Heroic Super Hero Portrait
Text-to-Image“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”
AI Judge Analysis
Grok Imagine Image
- + Captures a very dramatic, high-contrast lighting that emphasizes the sunset.
- + The cape flow is smooth and visually striking.
- + The silhouette and low-angle framing create a strong sense of scale.
- − The cape attachment to the shoulder is physically impossible and looks like a floating flap.
- − The city skyline is vague and lacks the distinct 'New York' features requested.
- − The costume design is a bit confusing with a red flap hanging between the legs.
Z-Image Turbo
- + Excellent adherence to the 'hands on hips' prompt while maintaining realistic finger details.
- + The background is much more detailed, featuring recognizable New York City architecture like the Freedom Tower.
- + Lighting is naturally diffused, giving it a more 'hyper-photorealistic' look as requested.
- − The hair is slightly longer than the 'short hair' requested, though still within a reasonable range.
- − The cape physics where it meets the ground seem slightly cut off by the edge of the rooftop.
Verdict: Z-Image Turbo is the clear winner as it adheres more closely to the specific posing and location details, including recognizable New York landmarks. Grok Imagine Image has a dramatic silhouette, but it suffers from anatomical costume errors and a generic background that doesn't quite meet the prompt's request for a detailed urban cityscape.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
Grok Imagine Image
- + Excellent typography with correct accents and distinct classical style
- + High-quality vector aesthetic with professional shading and texture
- + Perfect fulfillment of the 'Est. 1720' banner and steam requirements
- − Redundant 'Est. 1720' text at the bottom
- − Includes an abstract spoon/handle shape that wasn't requested
Z-Image Turbo
- + Clean minimalist design that follows the 'less is more' approach
- + Accurate banner placement and correct typography including the accent
- + Strong warm brown and cream color palette
- − The Cloche looks slightly more like a bowl or a striped dome than a standard cloche
- − The steam is very small and lacks the 'retro' stylistic weight of the first model
Verdict: Grok Imagine Image produced a much more professional-looking logo with superior typography and a sophisticated use of texture. While Z-Image Turbo followed the minimalist instruction well, its cloche illustration is less recognizable and the overall composition feels more like a basic clip-art logo compared to the polished, brand-ready look of Grok Imagine Image.
Grok Imagine Image
An image generation model by xAI designed to generate highly aesthetic images from text descriptions.
Z-Image Turbo
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering