xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model
Settled by community votes across 4 shared challenges, with an AI judge weighing in on each.
Grok Imagine Image Pro
#14 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7
#34 of 44 in Text-to-Image
Where the votes landed
Grok Imagine Image Pro
100.0%
win rate
Ties
0.0%
Wan 2.7
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Grok Imagine Image Pro
- + Excellent adherence to the 'chalk texture' requirement with grainy, realistic strokes.
- + Superior handwriting realism with natural variations in letter size and spacing.
- + Accurately represents the smudges and dust typically found on a real chalkboard.
- − The cursive in the title is more of a print-cursive hybrid rather than elegant cursive.
- − The bottom text line starts to look more like a digital font compared to the main items.
Wan 2.7
- + Perfectly legible text with very clean character rendering.
- + The layout is well-balanced with decorative dividers.
- + Successfully captures the 'elegant cursive' style for the title.
- − Text looks like a digital font 'sticker' rather than actual chalk on a board.
- − Lacks the requested chalk texture, appearing too smooth and uniform.
- − Shadowing/glow behind the letters makes it look like a graphic design rather than a physical object.
Verdict: Grok Imagine Image Pro followed the stylistic instructions much better, producing a board that looks like it was actually written on by a human with chalk. Wan 2.7 has better layout and prettier cursive, but the text is far too perfect and clean, ultimately failing the requirement for a realistic chalk texture with no digital font appearance.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
Grok Imagine Image Pro
- + Successfully followed the specific spatial instruction for the horse to be on top
- + Vibrant, cinematic color palette with high-quality nebula effects
- + Creative and surreal interpretation of the prompt
- − The horse is floating/leaping over the astronaut rather than strictly 'riding' him
- − Slightly messy anatomy where the horse's back hooves meet the galaxy background
Wan 2.7
- + Excellent anatomical detail on the horse and space suit
- + Clear, high-resolution textures
- + Good lighting and shadows consistent with the celestial environment
- − Failed the negative constraint: the astronaut is riding the horse, not vice versa
- − The interpretation is literal and common, lacking the requested surreal reversal
Verdict: The main differentiator was the complex spatial requirement of having the 'horse on top.' Grok Imagine Image Pro successfully attempted this surreal arrangement, creating a cinematic and vibrant scene, while Wan 2.7 defaulted to the standard trope of an astronaut riding a horse, completely ignoring the specific instruction to reverse the roles.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
Grok Imagine Image Pro
- + Excellent preservation of the subject's face, skin markings, and the background environment.
- + Seamless integration of the elaborate clothing with the body's pose and lighting.
- − Completely failed to use the specified 'Image 2' outfit (a casual navy pea coat and scarf), instead generating an unrelated royal garment.
- − The skin tone on the hands is significantly lighter than the face.
Wan 2.7
- + Preserved the subject's face, hair, and the background very well.
- + Maintained the vitiligo markings on the hands, showing higher attention to detail for consistency.
- − Completely failed to use the outfit from Image 2, generating a black embroidered coat instead.
- − The shadow on the ground does not match the new pose and legs as realistically as the original.
Verdict: Both models failed significantly on the primary instruction of the editing task, which was to use the specific outfit from Image 2 (a pea coat and scarf). Instead, both Grok Imagine Image Pro and Wan 2.7 generated their own interpretations of 'elaborate' clothing. Grok is slightly better in terms of lighting and photographic realism, while Wan 2.7 is better at maintaining the skin details (vitiligo) of the subject's hands.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Grok Imagine Image Pro
- + Excellent photographic quality with realistic depth of field and lighting.
- + Paws are correctly positioned on the steering wheel as requested.
- + The interior details like the fare meter and dashboard elements are highly convincing.
- − The capybaras front right paw has some anatomical blending issues with the steering wheel.
Wan 2.7
- + Captures a more classic chauffeur-style taxi hat.
- + Good facial expressions on both the capybara and the passenger.
- − The passenger is sitting in the front passenger seat instead of the back seat as requested.
- − The lighting on the capybara's fur feels slightly artificial compared to the background.
- − Strange paw/claw anatomy where they grip the wheel.
Verdict: Grok Imagine Image Pro followed the spatial instructions much better, correctly placing the passenger in the back seat and providing a more realistic interior perspective. Wan 2.7 failed the prompt by placing the passenger in the front seat and displayed less realistic lighting and texture integration.
Explore each model
Alibaba's Wan 2.7 image generation and editing model for text-to-image, reference-guided generation, and instruction-based image edits