Imagen 4.0 Ultra Generate 001 vs Stable Diffusion 3.5 Large

Head-to-head across 10 challenges

Imagen 4.0 Ultra Generate 001

50.0%

win rate

Ties

3.8%

Stable Diffusion 3.5 Large

46.2%

win rate

50.0% 3.8% ties 46.2%

Challenge Results

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Perfect adherence to spatial instructions with the book clearly sitting on top of the cube.
  • + Very high visual quality with realistic lighting, textures on the book, and wood grain.
  • + Correct interpretation of the plant being behind the cube and visible through the glass refraction.
  • The blue sphere appears to be floating inside the cube without a physical support, which looks slightly surreal.
  • The glass cube is rendered more like a solid block of glass/crystal than a hollow cube.

Stable Diffusion 3.5 Large

  • + Clean, modern aesthetic with realistic surface imperfections like dust on the glass.
  • + Strong lighting and shadow work that grounds the objects onto the table.
  • Failed the spatial requirement: the book is inside the cube rather than on top of it.
  • The glass cube's edges and corners are slightly inconsistent in how they overlap the book.
  • The plant is mostly to the side/front-left rather than behind the cube as requested.

Verdict: Imagen 4.0 Ultra followed every spatial instruction perfectly, correctly placing the red book on top of the glass cube and the plant behind it. Stable Diffusion 3.5 Large failed the primary prompt requirements by placing the book inside the cube and the plant in front, essentially reversing the requested layout.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
0% wins 0% ties 100% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Excellent photographic realism with natural skin textures and believable lighting.
  • + Superior rendering of fine details like the beads of rain on the jacket and the mechanical parts of the bike.
  • + Successfully achieves the requested shallow depth of field and 50mm look.
  • Misses the 'motion blur from passing cars' requirement as the car in the background is sharp.
  • The framing feels a bit too composed despite the 'imperfect framing' request.

Stable Diffusion 3.5 Large

  • + Captures the rainy atmosphere well with visible rain streaks and strong ground reflections.
  • + Better adherence to the 'imperfect framing' and 'motion blur' aspect of the prompt with the background vehicles.
  • Anatomical issues with the man's hands and arms which look distorted.
  • The image has a visible painterly/CG sheen that contradicts the 'no stylization' request.
  • The light rain looks more like heavy vertical lines rather than realistic precipitation.

Verdict: Imagen 4.0 Ultra produces a much more realistic and high-quality photograph with stunning detail in the skin and clothing textures. Stable Diffusion 3.5 Large follows the 'motion blur' prompt instruction more closely but fails on a technical level with distorted anatomy and a stylized, less realistic finish.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
20% wins 0% ties 80% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Excellent adherence to the grid layout for food photos.
  • + Clean, professional typography that is highly legible as a menu.
  • + Appropriate categorization of sections (Appetizers, Pizza, Mains) following the prompt.
  • Nonsense text for dish names and descriptions.
  • Small visual artifacts in some of the smaller food thumbnails.

Stable Diffusion 3.5 Large

  • + Bold, stylish 'Menu' title and high-quality food photography.
  • + Creative use of a vertical center column with side grid elements.
  • + High-resolution textures in the food images.
  • Poor adherence to the requested layout; sections are cramped and illegible.
  • Significant spelling errors in headers (e.g., 'MAIMAES' for Mains, 'APPETIZRS').
  • The grid layout feels less like a functional menu and more like a collage.

Verdict: Imagen 4.0 Ultra successfully captures the essence of a modern minimalist menu design with a practical, clean layout and distinct sections for different food types as requested. While Stable Diffusion 3.5 Large has very high-quality food imagery, its layout is chaotic, the text is cluttered, and it fails to create a professional, usable menu structure.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
0% wins 0% ties 100% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Perfect adherence to text placement and formatting instructions.
  • + Excellent clean, low-poly isometric aesthetic with high clarity.
  • + Very accurate 45° top-down perspective and centered diorama.
  • The textures are more plastic-like than 'realistic PBR' materials.
  • The lighting is a bit flat compared to the requested soft refined look.

Stable Diffusion 3.5 Large

  • + Good material textures on the salmon and base, showing more depth.
  • + Dynamic 3D representation of the flags and vegetation.
  • + Vibrant color palette and appealing soft lighting.
  • Failed to place text at 'top-center' as specified, instead attaching it to a flag.
  • The 'Japan' and 'Sushi' text is cluttered and not 'large bold' across the top.
  • Includes excessive garnish/decoration, disregarding the 'minimal' request.

Verdict: Imagen 4.0 Ultra is the clear winner due to its superior adherence to the layout and typography instructions. It correctly placed the text at the top-center of the frame and followed the minimal garnish prompt, whereas Stable Diffusion 3.5 Large integrated the text into the scene on a flag and added much more visual noise than requested.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
100% wins 0% ties 0% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Excellent depiction of the four specific animal species requested.
  • + Strong adherence to lighting effects like god rays and dew sparkles.
  • + High clarity and vibrant colors throughout the composition.
  • The style leans more toward a digital illustration than the requested hyper-photorealistic look.
  • The butterly sizes and placement feel a bit artificial/arranged.

Stable Diffusion 3.5 Large

  • + Successfully captures a more photorealistic look with natural motion blur.
  • + Dynamic composition showing the animals actually running/chasing.
  • + Soft, realistic fur textures and beautiful bokeh.
  • The fox and the kitten look very similar in facial structure, losing some species distinctness.
  • Does not show the 'dew sparkles' or 'god rays' as clearly as the other model.
  • The kitten's ears are somewhat overly pointed, resembling a lynx or fox more than a tabby kitten.

Verdict: Imagen 4.0 Ultra provided much better adherence to the specific details of the prompt, including the god rays, dew, and distinct animal species, though the style is quite illustrational. Stable Diffusion 3.5 Large achieved a much more realistic photographic aesthetic and a better sense of motion, but it failed to include the requested lighting effects and the animals look somewhat generic. Imagen 4.0 Ultra is the winner for its superior detail and exact species representation.

Victorian Greenhouse Oasis

Text-to-Image

“Hyper-photorealistic interior of a lush Victorian glass greenhouse filled with exotic tropical plants, vibrant blooming orchids, tall ferns, colorful butterflies in flight, sunlight filtering through ornate glass roof creating realistic caustics and dew on leaves, intricate iron framework visible, misty atmosphere, 8K masterpiece.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
80% wins 0% ties 20% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Exceptional color variety and vibrancy in the orchid blooms.
  • + Highly detailed and realistic rendering of the ferns covering the ground.
  • + Very intricate and consistent Victorian ironwork throughout the structure.
  • Butterflies look somewhat static and pasted-on rather than naturally integrated.
  • The composition feels slightly crowded with no clear floor space.

Stable Diffusion 3.5 Large

  • + Excellent atmospheric light rays and misty quality that feels more natural.
  • + Superior depth of field and composition with a clear floor path leading to the door.
  • + More naturalistic movement and integration of the butterflies in the scene.
  • The plant life is less diverse than Model A, with fewer blooming orchids.
  • The ironwork details in the upper-right corner become slightly muddy and lose definition.

Verdict: Both models captured the prompt well, but Imagen 4.0 Ultra excelled in botanical variety and the sheer density of requested elements like orchids and ferns. Stable Diffusion 3.5 Large produced a more moody and artistically composed image with superior atmospheric lighting and a better sense of spatial depth, though it was less 'lush' in its placement of flowers.

Heroic Super Hero Portrait

Text-to-Image

“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
67% wins 33% ties 0% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Perfect adherence to the hands-on-hips pose request.
  • + Extremely clean and coherent costume design with logical tailoring.
  • + Great rendering of the cape's weight and dramatic billowing.
  • The cityscape background is a bit sparse and desaturated.
  • The chest emblem is a literal Superman 'S', which might lack original creativity.

Stable Diffusion 3.5 Large

  • + Rich, detailed, and vibrant urban background that captures the New York feel.
  • + High-quality metallic textures and lighting effects on the suit.
  • + Good interpretation of the 'short hair' requirement.
  • Failed the pose requirement; hands are at sides rather than on hips.
  • Anatomical issues where the legs meet the boots/suit and some distortion in the hands.
  • The cape attachment looks unnatural, appearing to sprout from her shoulders rather than being worn.

Verdict: Imagen 4.0 Ultra is the superior choice because it followed the specific pose instructions (hands on hips) and produced a logically coherent character with realistic fabric and lighting. Stable Diffusion 3.5 Large has a much more impressive background and lighting, but it missed the pose requirement and has several anatomical and structural glitches in the lower body and cape.

Intricate Floral Mandala

Text-to-Image

“Perfectly symmetrical mandala made entirely of real flowers, petals, leaves, fruits, and seeds in vibrant natural colors, intricate layered patterns with radial symmetry, top-down view on a soft neutral background, hyper-detailed organic textures and subtle shadows, photorealistic, 8K masterpiece.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
50% wins 0% ties 50% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Excellent adherence to the 'photorealistic' requirement with naturalistic textures like water droplets and seed skins.
  • + Highly complex and dense composition that remains perfectly symmetrical.
  • + The neutral background has a high-quality fabric-like texture that adds to the realism.
  • Colors are slightly muted compared to the 'vibrant' request, though more realistic.
  • The composition feels very tight and cropped close to the edges.

Stable Diffusion 3.5 Large

  • + Beautifully vibrant colors and a clean, high-contrast presentation.
  • + Included a wider variety of distinct fruits like apples, oranges, and nuts as requested.
  • + Creates a very pleasant, illustrative floral pattern.
  • Fails the 'photorealistic' requirement, appearing more like a 3D digital render or vector art.
  • Symmetry is noticeably imperfect in the arrangement of the exterior fruits and seeds.
  • The central floral patterns look synthetic rather than being made of real petals.

Verdict: Imagen 4.0 Ultra successfully captures the requested photorealistic style with intricate organic textures and perfect radial symmetry. While Stable Diffusion 3.5 Large has more vibrant colors and includes a better variety of fruits, it looks like a digital illustration rather than a photograph of real objects and struggles with the precise symmetry of the scattered outer elements.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
67% wins 0% ties 33% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Perfect adherence to typography with correct spelling and accent.
  • + Clean, professional vector emblem execution.
  • + Sophisticated use of subtle background texture and warm tones.
  • The steam is very small and lacks visual impact.

Stable Diffusion 3.5 Large

  • + Stronger visual emphasis on the steam concept.
  • + Creative use of decorative corner elements and ornaments.
  • Spelling error in the main brand name ('Cafféé' instead of 'Caffè').
  • The cloche illustration appears disjointed and lacks the requested minimalism.
  • The banner composition is cluttered with multiple competing elements.

Verdict: Imagen 4.0 Ultra is the clear winner for its professional execution and accurate text rendering. While Stable Diffusion 3.5 Large attempted a more elaborate design, it failed on basic spelling and the minimalist requirement, resulting in a cluttered logo. Imagen 4.0 Ultra successfully captured the vintage minimalist aesthetic with high-quality vector-style lines.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

Imagen 4.0 Ultra Generate 001
Stable Diffusion 3.5 Large
50% wins 0% ties 50% wins

AI Judge Analysis

Imagen 4.0 Ultra Generate 001

  • + Excellent adherence to the infographic structure and layout requested.
  • + Clear legibility of English heading text.
  • + Strong thematic color palette and consistent iconographic style.

Stable Diffusion 3.5 Large

  • + Highly detailed vector-style illustration.
  • + Creative interpretation of space schematics.
  • + Beautiful texture and lighting effects on the lunar surface.
  • Failed to provide the requested step-by-step infographic structure.
  • Included a Space Shuttle-style vehicle instead of a Saturn V rocket.
  • Text is largely gibberish and placement is cluttered.

Verdict: Imagen 4.0 Ultra successfully followed the complex prompt instructions, creating a logical six-step infographic with readable headers and a clear visual flow. In contrast, Stable Diffusion 3.5 Large produced a visually stunning poster but failed on almost every technical requirement, including the specific rocket type and the sequential layout.

Imagen 4.0 Ultra Generate 001

Google's Imagen 4.0 Ultra model offering the highest fidelity and resolution for professional-grade image generation

Stable Diffusion 3.5 Large

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency