GPT Image 1.5 OpenAI Qwen Image 2512 Alibaba

Settled by community votes across 7 shared challenges, with an AI judge weighing in on each.

GPT Image 1.5

26.5 arena score

#7 of 44 in Text-to-Image

Top 3 in Image Editing

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Qwen Image 2512

22.4 arena score

#26 of 44 in Text-to-Image

Vote tally

Where the votes landed

GPT Image 1.5

83.3%

win rate

Ties

0.0%

Qwen Image 2512

16.7%

win rate

83.3% 0.0% ties 16.7%

Shared challenges 7

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

GPT Image 1.5

Qwen Image 2512

75% wins 0% ties 25% wins

AI Judge Analysis

GPT Image 1.5

+ Perfect adherence to the prompt's spatial instructions.
+ Excellent glass refraction showing the plant through the cube.
+ High-quality texture on the red book and wooden table.

− The sphere is quite large relative to the cube, making the 'small' descriptor subjective.
− The base of the cube looks slightly mirrors-like rather than simple glass.

Qwen Image 2512

+ Good lighting and realistic material textures.
+ Follows the prompt for all required elements.

− Significant optical inconsistency where a second blue sphere appears to be inside the glass wall.
− The sphere is off-center, making the composition feel less balanced.
− The glass has a heavy teal tint compared to the clear glass in Model A.

Verdict: GPT Image 1.5 is the clear winner as it correctly handles the transparency and refractions of the glass cube, showing the plant behind it naturally. Qwen Image 2512 suffers from a major hallucination where a duplicate blue sphere is embedded within the glass pane on the right, which ruins the realism of the scene.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

GPT Image 1.5

Qwen Image 2512

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1.5

+ Excellent adherence to the 'repairing' action with tools visible on the ground.
+ Highly realistic lighting and reflections on the wet pavement.
+ Perfectly captured the candid feel with the subject engrossed in work.

− The car in the background is sharp rather than showing the requested motion blur.
− The rain is very subtle, almost difficult to see.

Qwen Image 2512

+ Strong bokeh and shallow depth of field as requested.
+ Effective motion blur on the background vehicles.
+ High facial detail and natural skin texture.

− The subject is posing for a portrait rather than 'repairing' the bicycle as requested.
− The bicycle anatomy is slightly warped, particularly the handlebars and frame connection.
− Minimal visual evidence of rain beyond wet ground.

Verdict: GPT Image 1.5 is the winner because it correctly depicts the 'repairing' action with a candid, storytelling atmosphere, whereas Qwen Image 2512 produces a static portrait that ignores the core activity of the prompt. While Qwen better follows instructions for motion blur, the fundamental failure to show the man working makes it a less successful interpretation of the scene.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

GPT Image 1.5

Qwen Image 2512

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1.5

+ Excellent text rendering with no spelling errors
+ Includes specific logical categories and descriptions for items
+ High-quality, realistic food photography that matches the text labels

− The 'grid' layout for photos is a bit irregular
− Image is cropped tightly at the bottom

Qwen Image 2512

+ Strong minimalist aesthetic with a clean grid layout
+ Effective use of white space and bold typography

− Contains significant gibberish text and spelling errors
− Combines pizza and mains into one incoherent section
− The food images are repetitive and low-detail

Verdict: GPT Image 1.5 is the clear winner as it produces a fully functional, professional-grade menu with perfect text and logically categorized items. Qwen Image 2512 follows the requested grid layout more strictly but fails significantly on text legibility and content accuracy.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

GPT Image 1.5

Qwen Image 2512

AI Judge Analysis

GPT Image 1.5

+ Perfect spelling of all menu items including 'Risotto'.
+ Excellent chalk texture with realistic smudges and dusty residues.
+ Very realistic handwriting that feels organic and non-systemic.

− The layout is a bit sparse with significant empty space at the bottom.
− The handwriting style is slightly less 'elegant cursive' for the title as requested.

Qwen Image 2512

+ Strong composition that fills the board effectively with a cozy cafe background.
+ Beautifully rendered cursive calligraphy for the title and items.
+ Text is very high contrast and easy to read.

− Includes a spelling error in 'Risitto' (should be Risotto).
− The chalk looks slightly like a digital brush rather than a physical chalk stick compared to Model A.

Verdict: GPT Image 1.5 wins on technical accuracy and realism, correctly spelling all menu items and providing a more authentic chalk-on-blackboard texture. While Qwen Image 2512 has a more appealing composition and background, the spelling error in 'Risitto' makes it less useful for a final output.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

GPT Image 1.5

Qwen Image 2512

AI Judge Analysis

GPT Image 1.5

+ Excellent photorealism with cinematic lighting and depth of field.
+ The capybara's fur and expression are highly detailed and convincing.
+ Superior adherence to the specific hat design with legible 'TAXI' text.

− The capybara's paws look slightly more like hands/claws than natural capybara anatomy.

Qwen Image 2512

+ Successfully captures all prompt elements including the businesswoman and the driver.
+ Good balance in the composition showing more of the car interior.

− The capybara's front paws are incorrectly rendered, showing too many digits and a humanoid structure.
− Lighting on the capybara's face feels a bit flat compared to the background.

Verdict: GPT Image 1.5 is the winner due to its superior photorealistic textures and lighting, which make the absurd scene feel more grounded. While both models followed the prompt instructions well, the fine details in the capybara's fur and the legibility of the taxi hat text give GPT Image 1.5 the edge over Qwen Image 2512.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

GPT Image 1.5

Qwen Image 2512

75% wins 0% ties 25% wins

AI Judge Analysis

GPT Image 1.5

+ Captures the 'tumbling' and 'playful' aspect of the prompt much better than Model B.
+ Excellent rendering of golden hour light, god rays, and atmospheric dew sparkles.
+ Highly expressive and varied facial expressions on each animal.

− Anatomical issues with the kitten, including an extra paw and a confusing orientation of limbs.
− The butterfly's scale is a bit large compared to the animals.

Qwen Image 2512

+ Clean and anatomically correct subjects with very clear, high-resolution textures.
+ Well-organized composition that ensures all four animals are clearly visible and front-facing.
+ Beautifully detailed fur and sharp focus on the faces.

− The animals are posing for a portrait rather than 'playfully chasing and tumbling' as requested.
− The lighting and environment feel a bit more static and less magical than the god rays in Model A.
− Butterflies feel somewhat pasted onto the scene rather than part of the action.

Verdict: GPT Image 1.5 does a much better job of capturing the joyful, dynamic energy of animals tumbling and chasing butterflies in a magical atmosphere, though it suffers from significant AI artifacts in the kitten's anatomy. Qwen Image 2512 produces a much cleaner, higher-quality technical image with perfect anatomy, but it ignores the action-oriented parts of the prompt in favor of a static group portrait. GPT Image 1.5 is the preferred choice for its superior interpretation of the 'wholesome vibe' and interaction requested.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

GPT Image 1.5

Qwen Image 2512

AI Judge Analysis

GPT Image 1.5

+ Perfect adherence to all text requirements including 'Caffè Florian' and 'Est. 1720'.
+ Clean, minimalist vector aesthetic that works well as a logo.
+ Professional layout with a well-integrated banner.

− Slightly generic steam effect compared to the illustrative quality of the other model.

Qwen Image 2512

+ Beautiful hand-drawn illustrative style with great woodcut texture.
+ High-quality rendering of the smoke/steam.
+ Accurate text rendering for both terms.

− The composition is a bit more crowded than a typical 'minimalist' logo.
− Small artifacts in the crossbar of the 'F' in Florian.

Verdict: Both models followed the prompt exceptionally well, producing accurate text and thematic elements. GPT Image 1.5 is preferred for its cleaner, more 'minimalist' vector logo aesthetic, whereas Qwen Image 2512 leaned more into a complex illustration style that is slightly less practical for a versatile logo.