An image generation model by xAI designed to generate highly aesthetic images from text descriptions.
Settled by community votes across 3 shared challenges, with an AI judge weighing in on each.
Grok Imagine Image
#19 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7
#34 of 44 in Text-to-Image
Where the votes landed
Grok Imagine Image
0%
win rate
Ties
0%
Wan 2.7
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Grok Imagine Image
- + Excellent chalk texture with realistic dusty strokes and smudges on the board.
- + Authentic handwritten variation with a natural slant and inconsistent baseline.
- + Perfectly rendered text that precisely matches the requested prompt, including the date.
- − The 'elegant cursive' request for the title was interpreted as uppercase print, though it still looks handwritten.
Wan 2.7
- + Perfect text accuracy for every word in the prompt.
- + Attractive composition with decorative dividers.
- + Clean and legible presentation.
- − Text looks like a digital font or a sticker rather than actual chalk on the surface.
- − Lacks the characteristic texture, powder residue, and stroke pressure of real chalk.
- − No 'elegant cursive' applied to the title as requested.
Verdict: Grok Imagine is the clear winner because it successfully captured the 'chalk texture' and 'handwritten style' requested in the prompt, looking like a real physical object. Wan 2.7 produced very accurate text, but the rendering is flat and looks like a digital overlay, failing to meet the requirement for a realistic chalk texture.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
Grok Imagine Image
- + Excellent preservation of the source person face and upper hair detail.
- + The outfit is intricate and fits the royal theme.
- + Maintains the pose and background from Image 1 perfectly.
- − Completely failed to use the clothing from Image 2 (a modern pea coat and scarf).
- − The hand and rings look somewhat distorted and unnatural.
Wan 2.7
- + Attempts to follow the pose and maintain the background from Image 1.
- + Includes various jewelry and layers as requested.
- + Preserves the face and vitiligo markings accurately.
- − Failed to use the correct outfit from Image 2 (a modern pea coat and scarf).
- − The hands and fingers are significantly distorted with extra digits/anatomical errors.
Verdict: Both models failed the specific 'Image 2' constraint, completely ignoring the modern blue pea coat and plaid scarf in favor of generic 'elaborate' historical/fantasy costumes. Grok Imagine produced a much cleaner and higher quality image overall, whereas Wan 2.7 had severe anatomical issues with the hands and feet.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Grok Imagine Image
- + Excellent adherence to the 'back seat' positioning for the passenger
- + Realistic lighting and photorealistic textures for both the capybara and the businesswoman
- + Accurate depiction of a bored, normal expression on the human
- − The passenger is technically in the passenger seat rather than the back seat, based on car proportions shown
Wan 2.7
- + High level of detail in the taxi driver cap and the capybara's fur
- + Good interpretation of the 'bored' expression on the passenger
- − The passenger is clearly in the front passenger seat which contradicts the 'back seat' prompt
- − The perspective and composition are slightly more cramped and less cinematic
- − The passenger's hands and phone interaction look less natural
Verdict: Grok Imagine followed the prompt more effectively by placing the passenger further back and maintaining a professional, cinematic composition that felt more like a New York scene. Wan 2.7 failed to put the passenger in the back seat, placing her directly next to the driver instead, and had less convincing human-to-object interaction.
Explore each model
Alibaba's Wan 2.7 image generation and editing model for text-to-image, reference-guided generation, and instruction-based image edits