Qwen Image 2512 Alibaba Stable Diffusion 3.5 Large Stability AI

Settled by community votes across 6 shared challenges, with an AI judge weighing in on each.

Qwen Image 2512

22.4 arena score

#26 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Stable Diffusion 3.5 Large

22.9 arena score

#25 of 44 in Text-to-Image

Vote tally

Where the votes landed

Qwen Image 2512

80.0%

win rate

Ties

0.0%

Stable Diffusion 3.5 Large

20.0%

win rate

80.0% 0.0% ties 20.0%

Shared challenges 6

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Qwen Image 2512

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

Qwen Image 2512

+ Perfectly follows the spatial instruction of the book sitting on top of the cube.
+ Excellent rendering of refraction and reflections within the glass panels.
+ Very realistic lighting and natural depth of field.

− The glass has a slightly green tint rather than being perfectly clear.

Stable Diffusion 3.5 Large

+ Very clean, high-resolution rendering of the blue sphere.
+ Strong adherence to the 'soft window light' lighting instruction.

− Failed the spatial instruction: the book is inside the cube instead of on top.
− The placement of the sphere on top of the book contradicts 'inside the cube' if the book is considered the base.
− The glass cube lacks a bottom edge, appearing to just sit around the book.

Verdict: Qwen Image 2512 followed all spatial instructions perfectly, accurately placing the red book on top of the glass cube and the sphere inside. Stable Diffusion 3.5 Large failed the core prompt logic by placing the book inside the cube and the sphere on top of the book, which also resulted in a less convincing physical setup.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Qwen Image 2512

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

Qwen Image 2512

+ Exceptional realism in skin texture and facial details.
+ Perfectly captures the 'candid street photo' look with natural lighting.
+ Successfully incorporates motion blur from passing cars in the background.

− The man is posing/looking at the camera rather than actively repairing the bike.
− The bike is somewhat simplified in its mechanical structure.

Stable Diffusion 3.5 Large

+ Shows the subject actively engaged in the task of repairing/working on the bike.
+ Good atmospheric rain effects and reflections on the pavement.
+ Captures the full silhouette of the bike and the man.

− The man's proportions and arms look slightly distorted/wiry.
− The background car lacks the requested motion blur, appearing more static.
− The image has a more 'digital' feel compared to the requested 50mm lens look.

Verdict: Qwen Image 2512 wins due to its incredible photographic realism and adherence to the 'natural skin texture' and '50mm lens' prompts, looking like a genuine street photograph. While Stable Diffusion 3.5 Large better captured the action of 'repairing', it suffered from slight anatomical irregularities and a less convincing cinematic texture.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Qwen Image 2512

Stable Diffusion 3.5 Large

33% wins 0% ties 67% wins

AI Judge Analysis

Qwen Image 2512

+ Features a clear grid layout of food photos as requested.
+ Organized categorical sections (Appetizers, Mains, etc.) with pricing.
+ Good use of bold sans-serif fonts and vibrant accent colors.

− Text contains several spelling errors (e.g., 'APPEITIIZERS', 'MEANS').
− The small body text is somewhat garbled and contains artifacts.

Stable Diffusion 3.5 Large

+ High-quality food photography with a professional, clean aesthetic.
+ Excellent implementation of bold sans-serif typography.
+ Minimalist design creates a premium look for a casual dining setting.

− Layout is more of a spread than a single-page menu design.
− Text significantly degrades into gibberish in the lower sections.
− Food photos are cropped at the edges of the frame.

Verdict: Qwen Image 2512 produces a much more functional menu layout that follows the 'sections' part of the prompt more logically. While Stable Diffusion 3.5 Large has better individual photo quality and a sleek aesthetic, it feels more like a magazine spread than a usable restaurant menu, whereas Qwen Image 2512 captures the specific grid-and-list structure requested.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Qwen Image 2512

Stable Diffusion 3.5 Large

AI Judge Analysis

Qwen Image 2512

+ Excellent text rendering with 'JAPAN' and 'SUSHI' exactly as requested.
+ Perfectly captures the 'miniature 3D cartoon' aesthetic with smooth, clean textures.
+ Highly professional composition mirroring a polished mobile game or 3D icon.

Stable Diffusion 3.5 Large

+ Good isometric perspective and diorama base structure.
+ Vibrant colors and a high variety of sushi types.

− Failed the text placement requested, attaching it to a flag instead of top-center.
− Texture quality feels more like plastic than 'refined PBR' materials.
− The text on the main flag is slightly aliased and low-quality compared to the rest of the scene.

Verdict: Qwen Image 2512 followed the prompt instructions near-perfectly, placing the text and flag icon exactly where requested with a very clean, high-quality 3D render style. Stable Diffusion 3.5 Large struggled with the text placement and rendered the scene with a busier, less refined look that missed the 'minimal garnish' requirement.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Qwen Image 2512

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

Qwen Image 2512

+ Excellent fur texture and fine detail on all four animals.
+ Successfully includes all four requested animals (dog, cat, bunny, fox) in a clear composition.
+ Very high resolution with sharp focus on the central subjects.

− The composition is a static 'family portrait' style rather than the requested 'playfully chasing' and 'tumbling' action.
− The lighting feels a bit more artificial/digital compared to the more natural atmosphere in the other image.

Stable Diffusion 3.5 Large

+ Captures the 'chasing' and 'tumbling' movement requested in the prompt perfectly.
+ Breathtaking atmospheric lighting with beautiful bokeh, dew sparkles, and god rays.
+ High level of cuteness and dynamic energy that fits the 'joyful wholesome vibe'.

− The kitten's anatomy is slightly generic and looks more like a small fox/cat hybrid.
− Slightly lower sharpness on the fur textures compared to the other model.

Verdict: While Qwen Image 2512 produces a sharper image with better-defined textures for each specific animal, it fails to capture the 'chasing and tumbling' action requested. Stable Diffusion 3.5 Large is the superior choice here because it perfectly interprets the dynamic energy and atmospheric lighting of the prompt, creating a much more evocative and 'wholesome' scene.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Qwen Image 2512

Stable Diffusion 3.5 Large

100% wins 0% ties 0% wins

AI Judge Analysis

Qwen Image 2512

+ Excellent typography with correct spelling of 'Caffè'.
+ Highly detailed and polished vector emblem style with great use of shading.
+ Perfect integration of the 'Est. 1720' banner into the overall composition.

− The detail level leans more toward'vintage illustration' than 'minimalist' logo.

Stable Diffusion 3.5 Large

+ Closer adherence to the 'minimalist' aspect of the prompt with flat icons.
+ Subtle texture on the background adds a nice aged paper feel.

− Spelling error in 'Cafféé Florian' with an extra 'e'.
− The steam and cloche icons are poorly integrated and visually disjointed.
− The typography on the banner is less elegant than Model A.

Verdict: Qwen Image 2512 produces a much more professional and aesthetically pleasing logo with perfect spelling and high-quality vintage typography. While Stable Diffusion 3.5 Large attempts a more minimalist approach, it fails due to a spelling error ('Cafféé') and a clunky graphic design where elements appear floating and disconnected.

Next steps

Explore each model

Qwen Image 2512

Alibaba

Improved version of Alibaba's Qwen image model with better text rendering, finer natural textures, and more realistic human generation.

Vote this model in the arena

Arena profile Lumenfall catalog

Stable Diffusion 3.5 Large

Stability AI

Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency

Vote this model in the arena

Arena profile Lumenfall catalog