Gemini 3.1 Flash with image generation capabilities. High-efficiency image generation model with support for text rendering, reference images, search grounding, and thinking mode. The efficient counterpart to Gemini 3 Pro Image.
Settled by community votes across 12 shared challenges, with an AI judge weighing in on each.
Nano Banana 2
#1 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Z-Image Turbo
#15 of 44 in Text-to-Image
Where the votes landed
Nano Banana 2
72.7%
win rate
Ties
9.1%
Z-Image Turbo
18.2%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
Nano Banana 2
- + Excellent photographic realism with high-resolution textures on the wood and book.
- + Superb text rendering on the spine of the book.
- + Complex and accurate reflections and refractions through the glass surfaces.
- − The glass box looks more like a 5-sided display case with a mirrored base rather than a solid closed cube.
Z-Image Turbo
- + Clean and accurate adherence to all prompt elements including placement and lighting.
- + Good depth of field with realistic soft lighting from the left.
- − The plant in the background is very blurry and lacks detail compared to Model A.
- − Minor artifacts on the book edges where it meets the glass.
Verdict: Nano Banana 2 produces a significantly more detailed and high-quality image, with impressive text rendering and rich textures. While Z-Image Turbo captures the scene accurately, it lacks the professional photographic finish and sharp clarity found in Nano Banana 2.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
Nano Banana 2
- + Excellent adherence to the 'repairing' aspect of the prompt
- + Highly atmospheric with cinematic lighting and realistic wet pavement reflections
- + Followed instructions for street-style framing and motion blur from background cars
Z-Image Turbo
- + Realistic skin texture on the man's arms and face
- + Accurate depiction of a red bicycle
- + Visible rain effect matches the prompt
- − Failed the core 'repairing' action; the man is just holding the handlebars
- − Lacks the requested 'motion blur' on passing cars
- − The composition is flat and less cinematic than requested
Verdict: Nano Banana 2 is the clear winner as it captures the narrative of 'repairing' the bicycle with great detail, including tools and a crouched pose, while also nailing the cinematic atmosphere and 'motion blur' requested. Z-Image Turbo produced a high-quality literal image, but the man is simply standing with the bike rather than repairing it, and it missed the stylistic camera effects like motion blur.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
Nano Banana 2
- + Incredible attention to detail on the engraved armor runes and leather strap texture.
- + Stronger adherence to the 'battle-worn' prompt with realistic dirt, blood, and skin texture.
- + Perfectly executed shallow depth of field with embers that feel integrated into the scene.
- − The hand gripping the sword has some anatomical merging issues with the crossguard.
- − The lighting feels a bit more like a studio setup than organic torchlight.
Z-Image Turbo
- + Natural feeling warmth from the torchlight reflecting off the polished sections of armor.
- + Excellent rendition of braided hair and small beads as requested.
- + Clean composition with a convincing cinematic atmosphere.
- − Texture on the leather and cloth underlayer is softer and less detailed than the other model.
- − The 'battle-worn' elements like scars and dirt appear more like applied makeup than deep-seated grime.
Verdict: Nano Banana 2 delivers a much more gritty and detailed image that fully captures the 'battle-worn' aesthetic, especially within the intricate engravings and weathered leather. Z-Image Turbo produces a high-quality cinematic portrait with superior lighting, but it lacks the fine texture and intensity requested for the paladin's gear and skin. Nano Banana 2 is the winner for its superior prompt adherence regarding texture and the 'battle-worn' theme.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
Nano Banana 2
- + Excellent text legibility and logical menu structure with relevant item descriptions.
- + Perfect adherence to sections (Appetizers, Pizza, Mains) with corresponding imagery.
- + Clean, professional typography that actually reflects the food being shown.
- − Some minor text artifacts in long descriptions (e.g., 'romorvmArrese').
- − Layout is slightly more traditional than 'modern minimalist', feeling a bit crowded.
Z-Image Turbo
- + Strong color coordination with bold orange accents.
- + High-quality food photography with consistent lighting.
- − Serious spelling errors ('PIZZA MANS', 'SE TIIION').
- − Text under Appetizers is illegible gibberish.
- − Layout is chaotic with sections not properly aligned to the photos.
Verdict: Nano Banana 2 is much better, as it successfully creates a functional menu with legible sections, accurate food-to-text correspondence, and professional typography. Z-Image Turbo produces aesthetically pleasing food photos but fails significantly on text rendering and logical layout, resulting in nonsensical headings like 'PIZZA MANS'.
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
Nano Banana 2
- + Perfectly executes the 'exploded burger' concept with mid-air suspension.
- + Excellent adherence to all prompt details including the fiery starburst and specific text.
- + Dynamic composition with great lighting and atmospheric embers.
- − Some sauce splashes look slightly stylized rather than purely photorealistic.
Z-Image Turbo
- + High-quality texture on the bun and patties.
- + Clean, legible text for the price and sub-message.
- − Fails the 'exploded' requirement; the burger is assembled and floating as a block.
- − Redundant text ('MAGIC BURGER BURGER') indicates a spelling error.
- − Background is less dynamic and lacks the requested sense of motion.
Verdict: Nano Banana 2 is the clear winner as it fully understands the creative brief, specifically the 'exploded' nature of the burger where components are individual suspended. In contrast, Z-Image Turbo provides an assembled burger and includes a text repetition error. Nano Banana 2's composition feels significantly more professional and aligned with modern advertising aesthetics.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Nano Banana 2
- + Excellent chalk texture with realistic smudges and dust on the board.
- + Beautiful, consistent handwriting style that feels authentic to a café setting.
- + Strong composition with a warm, detailed background environment.
- − Minor spelling/spacing choices, though all requested words are legible.
Z-Image Turbo
- + Perfectly legible and centered text.
- + Follows the multi-line prompt structure well.
- − Contains a spelling error ('Mustroom' instead of 'Mushroom').
- − The chalk texture is too clean and uniform, looking more like a digital font than hand-drawn chalk.
- − Lack of environmental context or background depth.
Verdict: Nano Banana 2 is the clear winner as it captures the 'handwritten chalk' aesthetic perfectly, complete with realistic smudging and varying pressure. Z-Image Turbo produces text that looks like a digital font and includes a spelling mistake in the first menu item.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Nano Banana 2
- + Excellent interior detail including radio, meter, and textures.
- + Very realistic lighting and blurred Manhattan city background.
- + Highly detailed fur and realistic hat placement.
- − Completely missed the passenger in the back seat.
- − The paws are somewhat human-like and merged with the steering wheel.
Z-Image Turbo
- + Successfully included the human passenger in the back seat as requested.
- + The capybara's expression and forward-facing pose look humorous and professional.
- + Layout captures the 'yellow taxi' color better in the frame.
- − The paws are poorly rendered and do not appear to be gripping the wheel realistically.
- − The background is less distinctly Manhattan compared to Model A.
- − Lighting is a bit flat for a night scene.
Verdict: Nano Banana 2 produces a significantly more high-quality and realistic interior with superior textures and atmosphere, but it fails to include the requested passenger. Z-Image Turbo adheres better to the prompt by including the businesswoman, though its overall image quality and detail are lower. Z-Image Turbo is the winner for actually fulfilling all elements of the complex prompt correctly.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
Nano Banana 2
- + Excellent text rendering with no spelling errors.
- + Sophisticated illustrative style that fits the 'vintage gothic' and 'parchment' theme perfectly.
- + Strong composition with a cohesive ornamental border and scrollwork.
- − The scroll banner is integrated into the illustration rather than being a separate small banner atop existing text as some interpretations might suggest.
Z-Image Turbo
- + Dynamic lighting on the jack-o-lantern.
- + Includes clear thorny branches and spider web elements as requested.
- − Contains a spelling error in the location ('The Archves' instead of 'The Arches').
- − The composition feels fragmented with multiple disconnected scroll pieces.
- − The hierarchy of text is poor, with 'You are invited...' being tiny and positioned at the very top edge.
Verdict: Nano Banana 2 is the superior choice because it successfully followed all text instructions with perfect spelling and maintained a high-quality, professional illustrative style. Z-Image Turbo struggled with the layout, resulting in a cluttered composition and a notable spelling mistake in the event location.
Bald man challenge
Image Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
Nano Banana 2
- + Perfect adherence to the hair growth prompt
- + Exceptional preservation of the original face, clothes, and background
- + Seamless blending of new hair with the existing beard and sideburns
- − None identified
Z-Image Turbo
- + Preserved the face well
- − Failed to provide a 'full, thick head' of hair as requested
- − Altered the background textures and colors unnecessarily
- − The hair added is very sparse and Receding, ignoring the core instruction
Verdict: Nano Banana 2 successfully fulfilled the request by adding a realistic, thick head of hair while perfectly maintaining every other aspect of the original image. In contrast, Z-Image Turbo largely failed the prompt, providing only a slight increase in stubble on top while also making unwanted changes to the background environment.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
Nano Banana 2
- + Excellent rendering of materials like wood grain, salmon texture, and rice with realistic PBR lighting.
- + Perfectly followed text instructions, including the flag of Japan icon.
- + Rich and detailed composition while maintaining a clean diorama feel.
- − The 3D cartoon style is slightly more realistic than 'cartoon', leaning towards high-end architectural visualization.
Z-Image Turbo
- + Strong adherence to the 'cartoon' and 'soft' aesthetic with simple, rounded shapes.
- + Clean, minimalistic layout that fits the square format well.
- − Major error: displayed the flag of China instead of the flag of Japan for a Japan-themed prompt.
- − The 3D modeling of the rice is overly simplified, appearing like large white balls.
- − The sushi composition is very basic compared to the diversity usually associated with the dish.
Verdict: Nano Banana 2 is the clear winner as it accurately followed all prompt instructions, including the correct national flag and high-quality PBR material rendering. Z-Image Turbo captured the 'cartoon' style well but failed significantly by including the flag of China for a Japanese sushi prompt.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
Nano Banana 2
- + Excellent anatomical accuracy for all four animals.
- + Beautiful lighting with visible god rays and dew sparkles as requested.
- + Dynamic composition that conveys movement and playing.
- − The kitten's tail looks slightly disconnected from its body.
- − The background landscape is a bit generic.
Z-Image Turbo
- + Accurately captures the 'tumbling together' aspect of the prompt with physical interaction.
- + Very expressive faces on the puppy and fox.
- + Clean, soft bokeh effect in the background.
- − The kitten has an anatomical error with its paw emerging from the center of its chest/neck.
- − The butterflies are less integrated into the scene's lighting.
- − The puppy's paw on the rabbit looks a bit heavy/unnatural.
Verdict: Nano Banana 2 is the superior image because it manages to include all four requested animals with much better anatomical correctness and a more expansive, lush meadow environment. While Z-Image Turbo captures a cute moment of interaction, it suffers from significant anatomical merging issues where the kitten and puppy overlap.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
Nano Banana 2
- + Perfect text rendering for both the main name and the banner
- + Exceptional vector texture and vintage aesthetic
- + Complete adherence to the banner and cloche requirements
- − The 'è' in Caffè is slightly stylized like an accent but looks a bit like a stroke
Z-Image Turbo
- + Clean minimalist design
- + Accurate typography and color palette
- + Good use of negative space on the cloche
- − Missed the 'banner' requirement for the date
- − Layout is a bit generic compared to the requested emblem style
Verdict: Nano Banana 2 followed the prompt much more closely, including the requested banner for the date and a more sophisticated vector emblem style with subtle texture. While Z-Image Turbo produced a clean and usable logo, it opted for a standard linear layout and ignored the banner instruction.
Explore each model
Tongyi-MAI's 6-billion parameter distilled text-to-image model optimized for speed, achieving high-quality generation in 8 steps or fewer with support for bilingual text rendering