OpenAI's state-of-the-art image generation model with arbitrary resolution up to 4K and strong instruction following
Settled by community votes across 3 shared challenges, with an AI judge weighing in on each.
GPT Image 2
#3 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Qwen Image 2.0 Pro
#27 of 44 in Text-to-Image
Where the votes landed
GPT Image 2
0%
win rate
Ties
0%
Qwen Image 2.0 Pro
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 2
- + Excellent chalk texture with realistic dusty edges and letter variations.
- + Perfect adherence to the full menu list and pricing requested.
- + Coherent and warm café environment with a well-framed wooden board.
- − The cursive in the header is a bit stiff compared to professional chalk art.
Qwen Image 2.0 Pro
- + Clear and legible handwriting with good layout balance.
- + Strong chalk smudge effects that add to the realism of a used board.
- + Accurately rendered all requested text without spelling errors.
- − The letter texture looks slightly more like a digital paint brush than dry chalk.
- − The composition is at a sharp angle, making the text slightly harder to read than a straight-on shot.
Verdict: Both models followed the complex text prompt perfectly, with no spelling errors even in the long menu items. GPT Image 2 (Model A) is the winner because the chalk texture is significantly more realistic, capturing the grainy, dusty essence of physical chalk on slate better than Qwen Image 2.0 Pro (Model B), which has a smoother, more digital appearances.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
GPT Image 2
- + Captures a very cinematic, moody lighting style consistent with a night taxi
- + Excellent depth of field with realistic background bokeh
- + Highly detailed fur texture and plausible capybara anatomy
- − Composition is tight and crops out most of the cab's interior details
- − The capybara's left paw is somewhat indistinct on the wheel
Qwen Image 2.0 Pro
- + Includes more environmental details like the TLC license plate and dashboard
- + The passenger's expression and action perfectly match the 'bored businesswoman' prompt
- + Shows both paws clearly on the steering wheel as requested
- − Visible spelling error on the dashboard sticker ('Licesed')
- − The lighting is slightly flat and looks less like a single photorealistic shot
Verdict: GPT Image 2 offers superior aesthetic quality and lighting, creating a more believable cinematic scene. However, Qwen Image 2.0 Pro followed the specific compositional requirements more closely, including small details like the TLC license and a clearer view of both paws on the wheel. GPT Image 2 is generally preferred for its visual realism despite the tighter crop.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
GPT Image 2
- + Excellent typography with a perfect vintage gothic aesthetic
- + Complex and detailed composition featuring a custom NYC backdrop in the architecture
- + Perfectly captures the parchment texture and cinematic lighting requested
- − The '3' in the date is slightly stylized but still legible
Qwen Image 2.0 Pro
- + Creative use of green lighting for the jack-o-lantern
- + Clean and legible text layout
- + Good adherence to specific elements like the border of webs and thorns
- − The digital art style feels more modern than the requested 'vintage' and 'parchment' look
- − Bats have visible artifacts/clipping issues near the wings
- − The 'Halloween' text is slightly off-center
Verdict: GPT Image 2 perfectly captures the requested vintage gothic mood with sophisticated typography and a high level of background detail, including clever references to the NYC location. Qwen Image 2.0 Pro is colorful and meets all prompt requirements but leans too much into a clean, modern digital illustration style rather than the dark parchment poster aesthetic requested.
Explore each model
Alibaba's Qwen Image 2.0 Pro model offering higher quality image generation with enhanced detail and accuracy