Grok Imagine Image vs Wan 2.6
Head-to-head across 11 challenges
Grok Imagine Image
51.9%
win rate
Ties
3.7%
Wan 2.6
44.4%
win rate
Challenge Results
Man and Car in California
Editing“Make a photo of the man driving the car down the California coastline”
AI Judge Analysis
Grok Imagine Image
- + Excellent preservation of the car's model and specific design details
- + High visual quality with realistic motion blur and lighting
- + Successfully replaces the entire environment with a convincing California coastline
- − Completely failed to include the man from the second source image, using a generic placeholder instead
Wan 2.6
- + Excellent character preservation, accurately incorporating the specific man and his clothing from the source image
- + Strong composition that places the subject and car naturally within the requested environment
- + Maintains the luxury convertible aesthetic of the original car
- − The car model changed slightly from the original (interior dashboard and grilles are different)
Verdict: Wan 2.6 is the clear winner because it successfully integrated both source images by placing the specific man from the second image into the car from the first. Grok Imagine ignored the second source image entirely, rendering a generic driver that did not resemble the requested person.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
Grok Imagine Image
- + Strictly followed the category requirements for Appetizers, Pizza, and Mains.
- + Excellent usage of white space and professional sans-serif typography.
- + Sophisticated integration of food photography with the layout, using varying crops and positions.
- − Considerable amount of text repetition and gibberish in the item descriptions.
- − The 'Pizza' section contains several items that are clearly not pizzas (e.g., Steak Frites).
Wan 2.6
- + Successfully implemented the 'grid' layout requested in the prompt.
- + Higher quality, more realistic food photography with consistent lighting.
- + Includes pricing details which adds to the realism of a restaurant menu.
- − Failed to include a dedicated 'Appetizers' list, only having a heading above pizza photos.
- − Significant text rendering issues including garbled letters and nonsensical prices (e.g., $0.09).
- − Overall layout feels more like a flyer or advertisement than a functional menu.
Verdict: Grok Imagine Image followed the structural requirements of the prompt much better, providing distinct sections for all three requested categories and maintaining a clean, professional aesthetic. While Wan 2.6 has superior photographic quality and followed the 'grid' instruction, the actual content of the menu is disorganized and fails to provide the requested variety of food sections. Grok Imagine Image is the winner for its better adherence to the complex layout and categorization instructions.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
Grok Imagine Image
- + Perfectly rendered text with no spelling errors or repetitions.
- + Excellent chalk texture and smear effects for a realistic chalkboard look.
- + Very clean layout and composition that remains legible and professional.
- − The handwriting is slightly more 'neat' than 'elegant cursive' for the title.
- − The background cafe scene is very blurry and relatively generic.
Wan 2.6
- + Authentic-looking chalk dust accumulation at the bottom of the frame.
- + The heading captures the requested cursive style more effectively.
- + Excellent handwriting variation that feels genuinely human-drawn.
- − Redundant text repetition with '- $24' and '- $28' appearing twice for those items.
- − Inconsistent line spacing makes the board feel somewhat cluttered.
- − The 'cookies' item has a slight spelling/character merging issue in 'Chocolate'.
Verdict: Grok Imagine followed the complex text instructions perfectly, providing a clean, error-free menu with great chalk texture. Wan 2.1 had a more artistic and realistic cursive style with authentic dust details, but it failed on text accuracy by repeating prices and having slightly inconsistent spacing. Grok Imagine is the winner for its superior text rendering and adherence to the specific menu contents.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
Grok Imagine Image
- + Excellent character expressions that perfectly match the 'bored' and 'professional' descriptions
- + Strong lighting consistency between the interior cabin and the New York street lights
- + High level of detail on the capybara's fur and the passenger's coat
- − The passenger appears to be in the front seat rather than the back seat as requested
- − The steering wheel placement is slightly awkward relative to the capybara's paws
Wan 2.6
- + Correctly places the human passenger in the back seat
- + More authentic NYC taxi driver hat design and vehicle decals
- + Great use of external rain/wetness effects to add realism to the atmosphere
- − The capybara's hands look more like primate hands than capybara paws
- − The interior of the car looks significantly more worn/dirty than a typical commercial scene
- − The resolution and sharpness are slightly lower than Model A
Verdict: Both models captured the whimsical prompt well, but they succeeded in different areas of composition. Grok Imagine Image produced a higher quality, more photorealistic image with better expressions, but failed to place the passenger in the back seat. Wan 2.6 followed the spatial instructions perfectly and captured the gritty NYC atmosphere, though it suffered from anatomical issues with the capybara's paws.
Bald man challenge
Image Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
Grok Imagine Image
- + Excellent source preservation, keeping the face and background identical.
- + Very natural and believable hair texture and density.
- + Matches the hair color and lighting of the existing beard perfectly.
- − The hair is a bit thin in the front compared to the 'thick' request, though more realistic.
Wan 2.6
- + Successfully added a very thick and full head of hair as requested.
- + Good integration of the hair with the sideburns and beard.
- − Subtly altered facial features, making the person look younger and changing the eye area.
- − The hair thickness looks slightly artificial/wig-like along the top edge.
Verdict: Grok Imagine Image provides a much more successful edit by perfectly preserving the original person's identity and facial structure while adding realistic hair. Wan 2.6 followed the 'thick' instruction more aggressively but at the cost of altering the subject's face and original character.
Isometric Miniature Diorama Scenes
Text-to-Image“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”
AI Judge Analysis
Grok Imagine Image
- + Perfect adherence to the 45-degree isometric perspective and diorama base request.
- + Excellent text rendering and layout with consistent font and centered flag icon.
- + Higher variety and density of sushi models while maintaining a clean aesthetic.
- − The lighting is a bit harsh with very dark, sharp shadows compared to the 'gentle' request.
- − The plate cuts off slightly at the edge of the blue base.
Wan 2.6
- + Beautiful soft lighting and 'gentle' shadows that match the prompt perfectly.
- + High-quality PBR-like materials, especially on the wooden board and wasabi texture.
- + Very clean and modern graphic design for the text and flag.
- − The camera angle is slightly lower than the requested 45-degree top-down isometric view.
- − Text placement is a bit tight with the flag squeezed next to 'SUSHI' rather than being a separate small element.
Verdict: Both models followed the prompt well, but Grok Imagine Image (Model A) captured the specific '45-degree isometric' look and 'diorama' feel more accurately. While Wan 2.6 (Model B) had superior soft lighting and material textures that felt more premium, Model A's composition and precise adherence to the requested layout make it the winner for this specific challenge.
Over-the-top cartoon caricature
Editing“Create a caricature of me and my job. Make it exaggerated and humorous, incorporating my profession as a tv show anchor and my love for dogs and hockey.”
AI Judge Analysis
Grok Imagine Image
- + Retains an incredible facial resemblance to the woman in the source image.
- + Successfully interprets the 'caricature' style with the classic big-head/small-body aesthetic.
- + Perfectly integrates all requested themes: TV news desk, hockey rink background, and dogs with hockey gear.
- − The hockey stick in the dog's mouth is slightly warped.
- − The transition between the photographic face and the illustrated body is a bit jarring.
Wan 2.6
- + Strong, cohesive illustrative art style across the entire image.
- + Creative inclusion of a secondary dog (pug) in a hockey jersey.
- + Clear depiction of all requested elements: TV anchor equipment, dogs, and hockey gear.
- − Loses the specific facial likeness of the source image, looking like a generic cartoon character.
- − The left hand holding the hockey stick is anatomicaly awkward relative to the arm position.
Verdict: Grok Imagine Image is the superior choice because it successfully maintains the identity of the person in the source image, transforming her face into a recognizable caricature. While Wan 2.6 creates a fun illustration, it fails the 'caricature of me' aspect by replacing the user with a generic cartoon character. Grok's inclusion of specific details like the 'Sports Scoop' papers and the dogs on ice skates makes for a more clever interpretation of the prompt.
Studio Ghibli Anime Style
Editing“Transform this photo into a Studio Ghibli–inspired illustration. Use soft pastel colors, hand-painted textures, gentle lighting, dreamy backgrounds, and a warm, nostalgic mood”
AI Judge Analysis
Grok Imagine Image
- + Captures the iconic Studio Ghibli 'painted' background style with fluffy white clouds and blue skies.
- + Maintains high character fidelity, accurately preserving the poses, expressions, and clothing of the original meme.
- + The color palette is vibrant yet warm, consistent with Ghibli's summer-themed films.
- − The line art on the characters is a bit thin compared to the classic bold Ghibli linework.
- − The man's facial features feel slightly more generic than the source's expressive pout.
Wan 2.6
- + Successfully applies a beautiful watercolor 'wash' texture across the entire image.
- + Excellent hand-drawn aesthetic with soft, expressive line art.
- + The lighting is very gentle and dreamy, perfectly matching the 'soft pastel' request.
- − The addition of white bokeh/sparkle dots feels more like generic shoujo anime than specific Ghibli style.
- − The background is very washed out and loses the architectural detail present in the source.
- − The transition between the man's arm and the woman on the right is slightly messy.
Verdict: Grok Imagine Image is the winner because it perfectly balances the requested Studio Ghibli aesthetic with excellent source preservation. It maintains the specific layout, expressions, and colors of the original 'distracted boyfriend' meme while transforming the environment into a rich, hand-painted Ghibli world. Wan 2.6 provides a beautiful watercolor illustration, but it loses too much background detail and adds distracting sparkles that deviate from the Ghibli art style.
Golden Hour Stroll
Image Editing“Add dynamic motion to this photo: make hair blow in the wind, add leaves flying, energetic and lively feel.”
AI Judge Analysis
Grok Imagine Image
- + Excellent source preservation of the woman and dog
- + Highly energetic feel with numerous leaves
- − Leaves look like floating stickers rather than being part of the environment
- − Dog's ears are slightly distorted to simulate movement
Wan 2.6
- + Natural looking wind effect on the hair
- + Leaves are integrated realistically with motion blur
- + Near-perfect preservation of the original image details
- − Fewer leaves make the scene feel slightly less 'energetic' than instructed
Verdict: Grok Imagine followed the prompt by adding a very large amount of leaves, but they look like a flat overlay over the original image. Wan 2.6 provided a much more subtle and realistic edit, with hair that flows naturally and leaves that have appropriate motion blur, making it the superior technical edit.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
Grok Imagine Image
- + Excellent typography with proper accent on 'Caffè'
- + Superior vector-style illustration with clean linework
- + Accurate rendering of the requested text elements
- − Repeats the 'Est. 1720' text twice, which wasn't specifically requested
- − Includes some ambiguous shapes behind the cloche
Wan 2.6
- + Good use of the requested banner element for the date
- + Pleasant vintage border texture on the background
- + Simple and clear composition
- − Text rendering for 'Caffè' uses a generic font and is slightly poorly spaced
- − The cloche illustration is less refined and feels more like clip-art
Verdict: Grok Imagine Image provides a much more professional-grade logo with superior typography and high-quality vector illustrations that feel like a real brand identity. While Wan 2.6 follows the banner instruction well, the overall execution lacks the polish and typographic elegance found in the Grok output.
Apollo 11: Journey to Tranquility
Text-to-Image“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”
AI Judge Analysis
Grok Imagine Image
- + Successfully included all six requested infographic steps with corresponding icons.
- + Followed the color palette and flat-vector style perfectly.
- + Text and labels are mostly legible and properly arranged.
- − Contains minor AI spelling artifacts in the secondary text (e.g., '3rajoory', 'Moom').
- − The Saturn V icon is generic rather than realistic.
Wan 2.6
- + Clean, minimalist aesthetic with clear text rendering.
- + Good use of white space and profile silhouettes for the crew.
- − Complete failure to follow the core instruction of creating a 6-step infographic.
- − Missed all specific icons requested (Saturn V, orbit rings, trajectory arc, lunar module).
- − Lacks the complex informational density required by the prompt.
Verdict: Grok Imagine followed the complex prompt instructions near-perfectly, creating a detailed 6-step infographic with the requested icons and NASA-inspired color scheme. In contrast, Wan 2.6 failed to generate an infographic at all, producing a simple poster with only the crew names and none of the requested technical steps or specific icons.
Grok Imagine Image
An image generation model by xAI designed to generate highly aesthetic images from text descriptions.
Wan 2.6
Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English