OpenAI's previous generation image model with higher quality than DALL-E 2 and support for larger resolutions
Settled by community votes across 8 shared challenges, with an AI judge weighing in on each.
DALL-E 3
#35 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Qwen Image 2512
#26 of 44 in Text-to-Image
Where the votes landed
DALL-E 3
0.0%
win rate
Ties
0.0%
Qwen Image 2512
100.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
DALL-E 3
- + Excellent high-detail rendering of the wood and glass textures
- + Cinematic lighting with strong contrast
- + Artistic interpretation of the blue sphere containing a miniature world
- − Failed the spatial reasoning of the prompt by putting the book inside the cube
- − The cube has a wooden frame not mentioned in the prompt
- − The sphere is on top of the book rather than just inside the cube
Qwen Image 2512
- + Perfect adherence to the spatial requirements of the prompt
- + Accurately places the book on top of the cube and the sphere inside
- + Correctly captures the 'soft window light' lighting style
- − Simple, less detailed visual style compared to the competitor
- − The glass cube has some reflection inconsistencies on the internal surfaces
Verdict: While DALL-E 3 produced a more visually stunning and high-detail image, it completely failed to follow the requested spatial layout, placing the red book inside the cube instead of on top. Qwen-VL followed every instruction perfectly, accurately placing each object and capturing the specific lighting requested, making it the better response to the prompt.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
DALL-E 3
- + Strong cinematic composition with a meaningful puddle reflection.
- + Excellent use of foreground elements to create depth.
- − The character looks slightly stylized and more like a painting than a photograph.
- − Subject appears to be barefoot in the rain which feels illogical.
Qwen Image 2512
- + Highly realistic skin textures and clothing details.
- + Accurate representation of a red city bicycle and natural facial expression.
- − Composition is a bit centered and lacks the requested 'motion blur' on passing cars.
- − The man is posing for a portrait rather than appearing to 'repair' the bike.
Verdict: Qwen Image 2512 wins on photographic realism and natural skin texture, feeling like a genuine 50mm snapshot despite the posing. DALL-E 3 captures a more artistic mood and better environmental effects like reflections, but the character rendering is too illustrative for the 'no stylization' requirement.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
DALL-E 3
- + Strong minimalist aesthetic with high white space usage
- + Dynamic grid composition that integrates food photos with text
- + Includes distinct sections as requested in the prompt
- − Text is largely illegible gibberish
- − The layout is presented as four separate variations rather than one single menu mockup
Qwen Image 2512
- + Highly organized and professional menu structure
- + The food photography is high-quality and consistent in style
- + Excellent use of bold sans-serif fonts and vibrant accent colors
- − Several spelling errors in headings like 'Appetiizers' and 'Piesmants'
- − The grid of eight photos at the top feels slightly repetitive
Verdict: Both models followed the prompt well, but Qwen Image 2512 provided a much more realistic and usable layout for a casual dining menu. While DALL-E 3 captured a more artistic, minimalist vibe, it failed to produce legible text or a cohesive single-page design, whereas Qwen Image 2512 succeeded in creating a balanced, professional, and vibrant document.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
DALL-E 3
- + Excellent chalk texture and artistic flourishes.
- + Captures the lighting and cozy café atmosphere well.
- − Significant spelling errors and gibberish text throughout the board.
- − Failed to correctly list the requested menu items and prices.
- − Text layout is cluttered and difficult to read.
Qwen Image 2512
- + Highly accurate text rendering with almost perfect spelling.
- + Followed all prompt instructions including the specific date and menu items.
- + Consistent and convincing handwritten chalk style across all text.
- − Minor spelling error in 'Risitto' (Risotto).
- − Composition is a bit clinical and lacks the ornate flourishes seen in Model A.
Verdict: Qwen Image 2512 is the clear winner as it successfully rendered the specific text and prices requested in the prompt with high legibility. DALL-E 3 produced a visually atmospheric image but failed significantly on the text-to-image challenge, resulting in numerous spelling errors and incorrect menu content.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
DALL-E 3
- + Excellent texture on the capybara's fur and the taxi interior
- + Cinematic lighting with realistic reflections on the animal's face
- + Creative background detail with a 'CAPYBARA' sign in the distance
- − Completely missing the human businesswoman passenger requested in the prompt
- − Capybara is facing the side rather than forward through the windshield
Qwen Image 2512
- + Includes all requested elements including the bored businesswoman and the phone
- + Strong adherence to the requested composition with the capybara facing forward
- + Correct placement of the yellow taxi driver cap
- − The paws on the steering wheel look suspiciously like human hands in gloves
- − Lower overall resolution and clarity compared to the competitor
Verdict: While DALL-E 3 produces a higher quality image with superior textures and lighting, it failed significantly on prompt adherence by omitting the passenger entirely. Qwen Image 2512 followed all instructions, capturing the surreal scene of a capybara driver and a bored passenger looking at a phone, making it the better response for this specific challenge.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
DALL-E 3
- + Excellent 3D miniature aesthetic with soft, rounded textures.
- + Clean and simple diorama base design.
- + Very high clarity and vibrant color palette.
- − Failed to place text at the top-center as requested.
- − The word 'SUSHI' is missing entirely.
- − Included more garnishes than the 'minimal' request specified.
Qwen Image 2512
- + Followed text instructions perfectly, including 'JAPAN', 'SUSHI', and flag icon at top-center.
- + Realistic PBR materials are evident in the rice and fish textures.
- + Accurate 45-degree isometric projection.
- − The diorama base is a bit large compared to the plate.
- − Transition between the 'grass' and the plate looks slightly cluttered.
Verdict: While DALL-E 3 produced a very charming 3D model, it failed significantly on the specific text placement and content instructions. Qwen Image 2512 followed every instruction in the prompt, including the complex text layout and specific icons, while maintaining high visual quality and a sophisticated miniature diorama feel.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
DALL-E 3
- + Excellent adherence to the 'joyful' and 'wholesome' vibe with expressive, cute expressions.
- + Dynamic action with animals reaching for butterflies and interacting.
- + Strong lighting effects with prominent god rays.
- − Has a very stylized, 'Pixar-like' 3D render look rather than photorealistic.
- − The butterflies have bizarre furry bodies and animal-like faces.
- − Anatomy is simplified and cartoonish.
Qwen Image 2512
- + Much closer to the 'hyper-photorealistic' part of the prompt with realistic fur textures and features.
- + Anatomy of the kitten, puppy, and fox is highly accurate to real-life counterparts.
- + Beautiful natural lighting and depth of field.
- − The animals are largely posing for a portrait rather than 'tumbling' and 'chasing' as requested.
- − The rabbit's placement is a bit cramped between the puppy's paws.
Verdict: While DALL-E 3 captures the whimsical action and joyful energy of the prompt much better, it fails the request for photorealism, looking instead like a 3D animation. Qwen Image 2512 produces a stunningly realistic image with life-like textures and lighting, though it opted for a static pose rather than the requested tumbling action.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
DALL-E 3
- + Strong vector emblem aesthetic
- + Excellent use of negative space for the cloche
- − Failed to include the specific name 'Caffè Florian'
- − Text layout is somewhat crowded
Qwen Image 2512
- + Perfect adherence to the requested text name
- + Classic illustrative style that feels authentically vintage
- − Steam effect is slightly overcomplicated for a minimalist logo
- − Texture is very subtle
Verdict: Qwen Image 2512 is the clear winner as it followed the text instructions perfectly, whereas DALL-E 3 substituted the requested name with 'Coffee House'. Qwen Image 2512 also provided a more elegant banner and cloche illustration that fits the 'Caffè Florian' brand well.
Explore each model
Improved version of Alibaba's Qwen image model with better text rendering, finer natural textures, and more realistic human generation.