Grok Imagine Image xAI Qwen Image 2.0 Pro Alibaba

Settled by community votes across 2 shared challenges, with an AI judge weighing in on each.

Grok Imagine Image

24.1 arena score

#19 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Qwen Image 2.0 Pro

22.3 arena score

#27 of 44 in Text-to-Image

Vote tally

Where the votes landed

Grok Imagine Image

100.0%

win rate

Ties

0.0%

Qwen Image 2.0 Pro

0.0%

win rate

100.0% 0.0% ties 0.0%

Shared challenges 2

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

Grok Imagine Image

Qwen Image 2.0 Pro

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent text rendering with perfect spelling across all lines.
+ Highly realistic chalkboard texture with authentic chalk dust and smudging.
+ Cozy atmosphere accentuated by lighting and background depth.

− The 'elegant cursive' for the title is more of a print-style sans-serif compared to the other text.
− The handwriting looks slightly too uniform, almost like a digital chalk font.

Qwen Image 2.0 Pro

+ Successfully captured the 'natural variations in letter size' and slanted handwriting styles.
+ Excellent spatial layout with the text feeling very organic and hand-drawn.
+ Accurate rendering of the menu items and pricing as requested.

− Minor spelling error in the third item where 'Butter' is missing the second 't' (rendered as 'Buter').
− The chalkboard surface is slightly less detailed than Model A’s.

Verdict: Both models followed the complex prompt with high accuracy, but Grok Imagine (Image A) is the winner due to its flawless spelling and superior background rendering. While Qwen Image 2.0 Pro (Image B) captured the natural 'handwritten' irregularities better, it failed on the specific spelling of 'Butter', making Grok Imagine the more reliable choice for text-intensive tasks.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

Grok Imagine Image

Qwen Image 2.0 Pro

AI Judge Analysis

Grok Imagine Image

+ Excellent photorealism and cinematic lighting.
+ Perfectly captures the 'bored' expression of the passenger as requested.
+ Consistent texture on the capybara's fur and clothing.

− The passenger is sitting in the front seat instead of the back seat.
− The capybara's paws look more like bird talons or sharp claws than a capybara's actual feet.

Qwen Image 2.0 Pro

+ Greater attention to the 'back seat' positioning for the passenger.
+ Interesting use of motion blur in the background lights to suggest movement.
+ The capybara taxi cap is more detailed and authentic to a driver's uniform.

− The image quality is significantly lower with noticeable grain and less realistic skin/fur textures.
− The woman is holding her phone in a slightly awkward, distorted way.
− The capybara's hands look more like human hands than paws.

Verdict: Grok Imagine Image wins on sheer visual quality and atmosphere, despite failing to place the passenger in the back seat. Qwen Image 2.0 Pro followed the spatial instructions more accurately but suffered from lower resolution, muddy textures, and a less convincing 'photorealistic' appearance.