Grok Imagine Image vs Qwen Image 2512
Head-to-head across 8 challenges
Grok Imagine Image
47.1%
win rate
Ties
0.0%
Qwen Image 2512
52.9%
win rate
Challenge Results
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
Grok Imagine Image
- + Excellent photographic realism with natural lighting and depth of field.
- + Accurately represents the transparency and refraction of the glass cube.
- + Follows all spatial instructions including plant placement and lighting direction.
- − The sphere appears to be floating mid-air inside the cube without any support.
Qwen Image 2512
- + Strong adherence to the spatial requirements of the prompt.
- + The sphere is realistically resting on the bottom surface of the cube.
- + Good texture on the book and wooden table.
- − The glass cube has strange internal reflections that look more like mirrors than transparent glass.
- − The glass edges are overly thick and tinted green, making it look slightly less realistic than Model A.
Verdict: Both models followed the complex spatial prompt perfectly. Grok Imagine produced a more aesthetically pleasing, high-end photographic result with superior glass physics, though it chose to make the sphere float. Qwen Image 2512 provided a more grounded interpretation with the sphere resting on the base, but the glass transparency and reflections were less convincing.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
Grok Imagine Image
- + Excellent depiction of motion blur from passing cars
- + Perfect authentic 'imperfect framing' that feels like a real street snapshot
- + Accurate 50mm lens perspective and shallow depth of field
- − The subject's face is obscured and he is wearing a mask
- − The bike mechanics are a bit messy upon close inspection
Qwen Image 2512
- + Excellent natural skin texture and facial detail
- + Strong adherence to the 'elderly Japanese man' subject
- + Very high visual quality and clear subject focus
- − Fails the 'imperfect framing' prompt by centering the subject perfectly
- − Subject is posing rather than 'repairing' the bike as requested
- − Motion blur on cars is less pronounced and realistic than Model A
Verdict: Grok Imagine followed the stylistic cues of the prompt much better, capturing the 'imperfect framing,' motion blur, and candid nature of a street photograph perfectly. While Qwen Image 2512 produced a high-quality portrait with great skin texture, it ignored the 'repairing' action and 'imperfect framing' requests, resulting in a staged look rather than a candid one.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
Grok Imagine Image
- + Excellent text legibility and mostly correct spelling
- + Sophisticated layout with high-quality food photography integrated into the design
- + Adheres well to all category requirements (Appetizers, Pizza, Mains)
- − Some minor repetition in dish names
- − The food images are not in a strict 'grid' as requested, though cleverly integrated
Qwen Image 2512
- + Strict adherence to the 'grid' layout for food photos
- + Clean minimalist aesthetic that resembles a real menu template
- − Garbled, unreadable text throughout the menu
- − Merged sections (Pizza/Means) which creates a cluttered list
- − Generic 'Modern Restaurant' title lacks branding appeal
Verdict: Grok Imagine Image produced a far superior result that functions as a realistic menu with legible text and appetizing photography. While Qwen Image 2512 followed the 'grid' instruction more literally, its text is completely illegible and it failed to separate the categories correctly.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Grok Imagine Image
- + Excellent chalk texture with realistic dusty residue on the board
- + Perfect spelling for all requested menu items
- + Natural variation in letter sizing and spacing that feels authentically handmade
- − The cursive in the title is more of a print-style than 'elegant cursive'
Qwen Image 2512
- + Captures the 'elegant cursive' style for the title much better
- + Clean and highly legible layout
- + Strong rendering of the requested date and price details
- − Typo in 'Risitto' (should be Risotto)
- − The handwriting feels slightly more digital and less like raw chalk compared to the other model
Verdict: Both models followed the complex text instructions very well, with Grok Imagine providing a more authentic chalk texture and perfect spelling. Qwen Image 2512 better captured the requested elegant cursive style for the title, but it included a spelling error on 'Risitto' and the text looks a bit too polished for real chalk.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Grok Imagine Image
- + Excellent photorealism and skin textures on the human passenger.
- + Accurately places the passenger in the back seat as requested.
- + High level of detail on the capybara's fur and the taxi dashboard.
- − The passenger's hands interacting with the phone are slightly mangled.
- − The capybara's paws look more like bird talons or primate hands than capybara paws.
Qwen Image 2512
- + Excellent 'professional' capybara driver hat with a badge.
- + The capybara's paws on the steering wheel are more anatomically grounded.
- + Great lighting and reflections on the windshield and taxi roof.
- − The passenger appears to be in the front passenger seat rather than the back seat.
- − The passenger's facial expression is a bit distorted and less 'bored' than 'unhappy.'
Verdict: Grok Imagine Image followed the spatial instructions more closely by placing the businesswoman in the back seat, creating a more realistic taxi layout. While Qwen Image 2512 had a better hat design and more realistic capybara paws, it failed to place the passenger in the rear, which was a specific requirement of the prompt.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
Grok Imagine Image
- + Perfect text rendering and layout placement
- + Clean, high-quality isometric execution
- + Excellent adherence to the 'solid light blue background' requirement
- − The diorama base is a bit simplistic compared to the modeling of the sushi
- − The plate edge is slightly thin/fragile looking
Qwen Image 2512
- + Highly detailed textures and 3D modeling on the sushi and garnish
- + Beautifully realized diorama base with organic foliage details
- + Excellent soft lighting and material depth
- − The flag icon is placed to the right of the text rather than 'below' or 'top-center' as implied
- − Includes extra garnish not specifically requested (wasabi, ginger, leaves)
Verdict: Grok Imagine Image followed the technical layout and text instructions perfectly, providing a very clean professional graphic. Qwen Image 2512 provided a much more visually rich and artistic 3D diorama with superior textures, though its text and icon placement were slightly less accurate to the prompt.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
Grok Imagine Image
- + Excellent depiction of god rays and sunrise lighting.
- + High contrast and vibrant colors that enhance the 'wholesome' vibe.
- + Includes all four requested animals clearly.
- − The animals look somewhat 'plastic' or AI-stylized rather than hyper-photorealistic.
- − The butterflies are represented as tiny white specks rather than detailed insects.
- − The bunny has fox-like features and coloring.
Qwen Image 2512
- + More realistic anatomical proportions for the animals.
- + Butterflies are clearly rendered and well-integrated into the scene.
- + Better 'tumbling' interaction between the kitten, rabbit, and puppy.
- − The fox looks slightly older than a 'kit' compared to the other animals.
- − The kitten's eye orientation is slightly off.
Verdict: Qwen Image 2512 is the superior choice as it achieves a much higher level of photorealism and correctly renders the butterflies mentioned in the prompt. While Grok Imagine captures the 'god rays' more dramatically, its animals appear overly smoothed and stylized, and it fails to deliver detailed butterflies. Qwen Image 2512 also captures a more believable interaction between the animals.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
Grok Imagine Image
- + Perfectly legible and crisp typography.
- + Clean vector emblem style that adheres to the minimalist request.
- + Accurate text rendering for both the name and the date.
- − Repeats the 'Est. 1720' text twice, which was not requested.
- − The 'banner' for the date is very abstract and lacks the classic banner feel.
Qwen Image 2512
- + Excellent vintage engraving/hatching details on the cloche.
- + Strong artistic rendering of steam that feels more 'classic'.
- + Includes the requested banner element for the 'Est. 1720' text.
- − The text 'Caffè' has a slightly awkward connection between the floral 'f' and 'e'.
- − Less minimalist than requested, leaning more into a complex illustration.
Verdict: Grok Imagine Image provides a very clean, modern minimalist logo that is highly functional and has perfect typography, though it redundantely includes the date twice. Qwen Image 2512 offers a much richer artistic style with beautiful vintage textures and a proper banner, which feels more authentic to a historic 1720 establishment, despite being slightly less 'minimalist'.
Grok Imagine Image
An image generation model by xAI designed to generate highly aesthetic images from text descriptions.
Qwen Image 2512
Improved version of Alibaba's Qwen image model with better text rendering, finer natural textures, and more realistic human generation.