Gemini 3.1 Flash with image generation capabilities. High-efficiency image generation model with support for text rendering, reference images, search grounding, and thinking mode. The efficient counterpart to Gemini 3 Pro Image.
Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.
Nano Banana 2
#1 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Medium
#41 of 44 in Text-to-Image
Where the votes landed
Nano Banana 2
100.0%
win rate
Ties
0.0%
Stable Diffusion 3.5 Medium
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
Nano Banana 2
- + Excellent adherence to the 'horse on top' prompt instruction
- + Rich, vibrant colors and high level of detail in both the nebula and spacesuit
- + Dynamic cinematic composition with a clear sense of movement
- − The horse's front hoof intersecting the helmet is a bit messy
- − Highly stylized/AI-core aesthetic might feel over-saturated to some
Stable Diffusion 3.5 Medium
- + Clean, photographic lighting on the astronaut and horse
- + High contrast between the subjects and the dark space background
- − Failed the primary prompt instruction by putting the astronaut on top
- − Anatomical issues with the horse's legs and hooves
- − Composition is static and less cinematic than requested
Verdict: Nano Banana 2 successfully interpreted the tricky surrealist prompt by correctly placing the horse on top of the astronaut. In contrast, Stable Diffusion 3.5 Medium ignored the specific positioning instruction and produced a standard 'astronaut riding horse' image with several anatomical glitches in the horse's legs.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
Nano Banana 2
- + Flawless text rendering for every specific detail requested.
- + Excellent composition that integrates the artistic elements and text seamlessly.
- + Strict adherence to all prompt elements including the scroll banner, specific border, and central jack-o-lantern.
- − None identified as it perfectly met the complex text and layout requirements.
Stable Diffusion 3.5 Medium
- + Captures the 'twisted trees' and 'spooky border' requirements reasonably well.
- + Good use of color and high-contrast lighting on the jack-o-lanterns.
- − Failed significantly on text rendering with multiple spelling errors like 'Halloweeen' and 'Inviloween'.
- − Missing the specific requested text for the banner and incorrectly formatted the date and location.
- − Lacks the central jack-o-lantern required by the prompt, placing two off to the sides instead.
Verdict: Nano Banana 2 is the clear winner as it followed every instruction perfectly, specifically excelling at the difficult task of rendering long strings of custom text without a single error. Stable Diffusion 3.5 Medium struggled with the text and failed to include several key compositional elements like the central jack-o-lantern and the specific banner content.
Explore each model
Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding