Grok Imagine Image xAI Stable Diffusion 3.5 Large Stability AI

Settled by community votes across 8 shared challenges, with an AI judge weighing in on each.

Grok Imagine Image

24.1 arena score

#19 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Stable Diffusion 3.5 Large

22.9 arena score

#25 of 44 in Text-to-Image

Vote tally

Where the votes landed

Grok Imagine Image

31.3%

win rate

Ties

6.3%

Stable Diffusion 3.5 Large

62.5%

win rate

31.3% 6.3% ties 62.5%

Shared challenges 8

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Grok Imagine Image

Stable Diffusion 3.5 Large

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image

+ Perfectly adheres to the spatial prompt with the book on top and the plant behind.
+ Excellent realism in the glass refraction and the wooden tabletop texture.
+ Accurately renders the 'soft window light' requested in the prompt.

− The blue sphere is levitating inside the cube without a physical support, which looks slightly unnatural.
− The cube dimensions are more rectangular than a perfect cube.

Stable Diffusion 3.5 Large

+ High clarity and sharp details on the glass edges and book pages.
+ Good rendering of the sphere sitting physically on the surface of the book.

− Failed the spatial logic of the prompt by putting the book inside/under the cube rather than on top.
− The plant is mostly to the side and reflecting in the glass rather than clearly 'behind' and visible through it.
− The light is quite harsh and direct, lacking the 'soft' quality requested.

Verdict: Grok Imagine followed every spatial instruction in the prompt, correctly placing the book on top of the cube and the plant behind it. Stable Diffusion 3.5 Large failed the core spatial relationship by placing the red book inside the cube, and it missed the 'soft' lighting requirement. Grok Imagine is the clear winner for superior prompt adherence and realistic lighting.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Grok Imagine Image

Stable Diffusion 3.5 Large

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent depiction of motion blur from passing cars
+ Perfectly captures the 'imperfect framing' and 'candid' feel requested
+ Stronger adherence to the shallow depth of field and street photography aesthetic

− The subject's face is largely obscured
− The red bicycle frame has some minor geometric inconsistencies near the handlebars

Stable Diffusion 3.5 Large

+ Stronger visual of the rain itself falling in the air
+ Clearer look at the subject's face and natural skin texture
+ Vibrant colors and high resolution clarity

− Failed to include motion blur from passing cars
− The layout is too perfectly centered, missing the 'imperfect framing' requested
− The bicycle design is physically impossible with multiple disconnected frames

Verdict: Grok Imagine Image followed the technical prompts for camera work much better, successfully incorporating motion blur and an offset, candid composition. Stable Diffusion 3.5 Large produced a more traditional 'portrait' style that ignored the motion blur and imperfect framing requests, and the bicycle's structural logic is significantly flawed.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

Grok Imagine Image

Stable Diffusion 3.5 Large

0% wins 50% ties 50% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent adherence to the 'beads in hair' prompt with clearly visible details.
+ Very high quality engraving on the armor with beautiful metallic reflections.
+ Strong bokeh effect and atmospheric torchlight that fits the requested mood perfectly.

− The character's face looks a bit too pristine and 'model-like' for a battle-worn warrior despite the scars.
− Leather strap texture is present but a bit smooth compared to the armor.

Stable Diffusion 3.5 Large

+ The facial expression and grittier skin texture better convey the 'battle-worn' aspect of the prompt.
+ Superior rendering of the cloth underlayers, including chainmail and heavy woven fabric.
+ Natural looking braids and skin blemishes that add to the realism.

− Failed to include the specific 'small beads' in the hair braids requested in the prompt.
− The engraving on the armor is slightly messy and less defined than in Model A.

Verdict: Grok Imagine followed the prompt more closely by including the specific detail of beads in the hair and providing a more cinematic lighting setup. However, Stable Diffusion 3.5 Large captured the 'battle-worn' aesthetic more convincingly with realistic skin textures and complex cloth/chainmail layers. While Model B is more character-accurate to a paladin, Model A is the technical winner for satisfying all descriptive prompts including the smaller details.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Grok Imagine Image

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent adherence to the content request, featuring clear sections for Appetizers, Pizza, and Mains.
+ Superior font rendering and overall layout that looks like a functional, professional menu.
+ Integrated food photos that fit naturally within the design flow rather than being boxed off.

− Several duplicate text entries (e.g., 'STUFFED MUSHROOMS' and 'STEAK FRITES' listed multiple times).
− Some minor spelling errors in small header text like 'ARIODE SALMON'.

Stable Diffusion 3.5 Large

+ High-quality, vibrant food photography with good clarity.
+ Creative grid layout that uses the food as a frame for the central text.

− Poor text rendering with several illegible words and symbols throughout the design.
− The grid layout makes it look more like a poster/advertisement than an actual functional menu.
− The sections requested (Appetizers, Pizza, Mains) are poorly labeled or misspelled (e.g., 'MAIMAES', 'APPETIZRS').

Verdict: Grok Imagine produced a far more usable and professional-looking menu that correctly interpreted the layout and content sections requested. While it suffered from some repetitive text entries, Stable Diffusion 3.5 Large failed significantly at text legibility and produced a layout that prioritized a photo grid over the functional requirements of a restaurant menu.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Grok Imagine Image

Stable Diffusion 3.5 Large

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent text rendering with clean, professional typography at the top-center.
+ Perfect adherence to the isometric 45-degree angle and diorama base request.
+ Highly clean, vector-like 3D aesthetic with a solid light blue background as requested.

− The sushi details inside the rolls are a bit generic and look more like simple geometric shapes than food.

Stable Diffusion 3.5 Large

+ Beautiful textures on the rice and fish, capturing the 'refined textures' prompt well.
+ Great 3D diorama feel with depth and varied garnish.

− Failed to place text at the top-center, instead integrating it into a flag on the base.
− The background is a gradient/shadowed surface rather than a solid light blue.
− The 'JAPAN' text has some rendering artifacts/bloating on the letters.

Verdict: Grok Imagine followed the layout instructions perfectly, placing the text at the top-center and using a solid background for a clean graphic design feel. Stable Diffusion 3.5 Large produced significantly more detailed and appetizing 3D models for the sushi itself, but it failed on several key composition prompts including text placement and background color.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Grok Imagine Image

Stable Diffusion 3.5 Large

0% wins 0% ties 100% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent depiction of god rays and sunrise lighting
+ Very high sharpness and detail in the fur textures
+ Includes all requested animals in a clear, balanced composition

− The posing is static and 'portrait-like' rather than playful chasing
− The butterflies are tiny and look more like glowing insects
− Slightly AI-typical 'glossy' look to the eyes

Stable Diffusion 3.5 Large

+ Perfectly captures the 'playfully chasing' and action-oriented part of the prompt
+ Beautiful bokeh, dew sparkles, and lighting effects
+ Natural, dynamic poses for all four animals

− The fox kit in the background is slightly blurry and lacks the fine detail of the foreground animals
− The kitten's facial structure is slightly less defined compared to Model A

Verdict: Stable Diffusion 3.5 Large wins this comparison because it successfully captured the 'playfully chasing' and 'tumbling together' action requested in the prompt, whereas Grok Imagine produced a static group portrait. Stable Diffusion 3.5 Large also provided much better butterfly rendering and a more immersive, atmosphere-heavy scene with dew and bokeh.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Grok Imagine Image

Stable Diffusion 3.5 Large

33% wins 0% ties 67% wins

AI Judge Analysis

Grok Imagine Image

+ Perfect typography that accurately renders 'Caffè Florian' with the correct accent.
+ Clean, professional vector illustration style that fits a modern minimalist logo.
+ Strong use of the requested warm brown and cream tones with subtle grainy texture.

− Redundant 'Est. 1720' text appears twice in the stack.
− The brown banner for the date is slightly simplified compared to the vector style.

Stable Diffusion 3.5 Large

+ Excellent background texture with vintage corner flourishes that add to the 'classic' feel.
+ Good use of a literal banner for the shop name, creating a cohesive emblem shape.
+ Balanced composition with clear separation of elements.

− Spelling error in the main text ('Cafféé' instead of 'Caffè').
− The cloche and steam illustration is less refined and feels slightly generic.
− Logo elements (cloche and banner) are not as well-integrated into a singular emblem as Model A.

Verdict: Grok Imagine is the winner because it provides a clean, professional vector logo with perfect text rendering and the correct accent on 'Caffè', making it far more usable. While Stable Diffusion 3.5 Large has a nice vintage background aesthetic, it fails to spell the brand name correctly and the central illustration lacks the polish found in Grok Imagine.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

Grok Imagine Image

Stable Diffusion 3.5 Large

60% wins 0% ties 40% wins

AI Judge Analysis

Grok Imagine Image

+ Strictly followed the requested 1-6 step structure with accurate labels.
+ Excellent text rendering for main headers and specific names like Armstrong and Aldrin.
+ Perfectly captured the flat-vector infographic aesthetic and the NASA-inspired color palette.

− Some minor garbled text in the small sub-captions (e.g., '3rajcoory').
− Iconography for the Moon includes Saturn-like rings which is scientifically inaccurate.

Stable Diffusion 3.5 Large

+ Clean, technical vector style with interesting layout elements.
+ Pleasing muted color palette that feels retro-futuristic.

− Failed to follow the numbered 6-step prompt structure entirely.
− Depicted a Space Shuttle-style craft instead of the Saturn V requested.
− Text is mostly illegible gibberish throughout the image.

Verdict: Grok Imagine is the clear winner as it followed every instruction in the prompt, including the specific 6-step sequence and the list of crew members. Stable Diffusion 3.5 Large produced a visually interesting technical drawing, but it failed on text legibility, structural adherence, and historical accuracy by showing a Space Shuttle instead of a Saturn V.

Next steps

Explore each model

Grok Imagine Image

xAI

An image generation model by xAI designed to generate highly aesthetic images from text descriptions.

Vote this model in the arena

Arena profile Lumenfall catalog

Stable Diffusion 3.5 Large

Stability AI

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency

Vote this model in the arena

Arena profile Lumenfall catalog