GPT Image 1.5 vs Wan 2.6
Head-to-head across 10 challenges
GPT Image 1.5
80.0%
win rate
Ties
6.7%
Wan 2.6
13.3%
win rate
Challenge Results
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
GPT Image 1.5
- + Excellent close-up detail on the capybara's fur and facial expression.
- + Precise adherence to the 'taxi' text on the cap.
- + The lighting is warm and cinematic, consistent with a night taxi ride.
- − The capybara's anatomy on the steering wheel looks slightly cramped due to the tight framing.
- − The background passenger is more blurred than in the competing image.
Wan 2.6
- + Dynamic composition that shows more of the car exterior and the rainy Manhattan environment.
- + Excellent realism in the textures of the coat and the rainy window glass.
- + Captures the 'bored' expression of the passenger very effectively.
- − The capybara's paws on the steering wheel look somewhat distorted and unnatural.
- − The hat has a generic police-style badge rather than saying 'TAXI' as requested by the prompt's context.
Verdict: GPT Image 1.5 provides a much more intimate and detailed character study with perfect text rendering on the hat, while Wan 2.6 offers a wider, more atmospheric scene that captures the rainy New York vibe. GPT Image 1.5 is preferred for its superior rendering of the capybara and better adherence to specific costume details.
Man and Car in California
Editing“Make a photo of the man driving the car down the California coastline”
AI Judge Analysis
GPT Image 1.5
- + Excellent preservation of the man's facial features and unique hairstyle.
- + Accurately places the car on a winding coastal road that fits the California theme.
- + Good integration of the interior car details like the white leather seats.
- − The car is positioned on the wrong side of the road for US driving.
- − The steering wheel is oddly low and small relative to the man's hands.
Wan 2.6
- + Great sense of motion with high-quality background blur on the wheels and road.
- + The composition is more dynamic and cinematic for a car photo.
- + Preserves the man's likeness and clothing patterns very well.
- − The steering wheel and hands are slightly distorted and messy.
- − The man's hair is slightly simplified compared to the source image.
Verdict: Both models successfully combined the man and the car into the requested California coastline setting while preserving their key characteristics. GPT Image 1.5 has better facial fidelity but places the car on the left side of the road, whereas Wan 2.6 provides a much more convincing action shot with superior lighting and a realistic driving perspective.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 1.5
- + Excellent text accuracy with no spelling errors.
- + Consistent and elegant handwriting style that looks highly realistic.
- + Perfect spacing and layout on the board.
- − The 'chalkboard dust' effect is very uniform, looking slightly digital in its distribution.
Wan 2.6
- + Excellent chalk texture and realistic smudging on the board surface.
- + More dynamic and authentic-looking café environment context with visible lighting.
- + Strong 'handwritten' aesthetic with charming character variation.
- − Repeating price tags for each item creates a cluttered and repetitive layout.
- − Slight punctuation issues such as the trailing comma after 'Specials'.
Verdict: GPT Image 1.5 provides near-perfect text rendering and adheres strictly to the layout instructions with clean, elegant handwriting. Wan 2.6 has superior surface textures and a more authentic 'chalk' feel but suffers from repetitive text elements (double pricing) and slightly messy composition.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
GPT Image 1.5
- + Perfectly legible English text with accurate menu item names and descriptions.
- + Excellent layout balance where the food photos correspond directly to the adjacent text sections.
- + High-quality, appetizing food photography with clear focus and vibrant colors.
- − The grid for photos is vertical on one side rather than a centralized grid mentioned in the prompt.
Wan 2.6
- + Stronger adherence to the 'grid' request for the layout of food images.
- + Vibrant colorful accents on the borders create a modern aesthetic.
- − Text is largely gibberish with numerous spelling errors and artifacts.
- − The scaling of the text is inconsistent and messy in the 'Mains' column.
- − Visual quality of the food items is lower, with some blurring and strange shapes.
Verdict: GPT Image 1.5 is the clear winner as it produces a professional, usable menu with perfect text rendering and high-quality food photography. Wan 2.6 captures the 'grid' layout and colorful accents well, but fails significantly on text legibility and overall professional finish.
Bald man challenge
Image Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
GPT Image 1.5
- + Excellent preservation of facial features and identity
- + Realistic curly hair texture that matches the beard and eyebrows
- + Maintains original lighting and image sharpness perfectly
- − The hairline transition on the forehead is a bit sharp/uniform
Wan 2.6
- + Impressive volume and thickness of hair
- + Matches the overall color palette of the image well
- − Substantially alters facial features, making the man look younger and like a different person
- − The hair texture appears slightly too soft/painterly compared to the gritty detail of the source beard
- − The hairline is unnaturally low, encroaching on the glasses
Verdict: GPT Image 1.5 is the clear winner as it successfully adds the hair while perfectly preserving the identity, facial features, and detail levels of the original subject. In contrast, Wan 2.6 essentially generates a new face that resembles the original but loses the specific characteristics of the source person, which fails the preservation requirement of an image editing task.
Over-the-top cartoon caricature
Editing“Create a caricature of me and my job. Make it exaggerated and humorous, incorporating my profession as a tv show anchor and my love for dogs and hockey.”
AI Judge Analysis
GPT Image 1.5
- + Excellently captures the subject's facial features in a caricature style.
- + Cleverly integrates all three prompts (news, dogs, hockey) into a cohesive scene.
- + High-quality rendering with professional digital painting aesthetics.
- − The fingers on the hand holding the microphone are anatomicaly messy.
Wan 2.6
- + Functional caricature style that includes all requested elements.
- + Good use of space to show the full character and setting.
- − The character's face bears very little resemblance to the source image provided.
- − The hockey stick is being held in a physically impossible way by both the human and the dog.
Verdict: GPT Image 1.5 is the clear winner as it successfully maintains the subject's likeness while translating her into a caricature, whereas Wan 2.6 creates a generic cartoon face. GPT Image 1.5 also provides a more creative composition, integrating the hockey element into the background news ticker and a dog's accessory rather than just having characters hold a stick.
Studio Ghibli Anime Style
Editing“Transform this photo into a Studio Ghibli–inspired illustration. Use soft pastel colors, hand-painted textures, gentle lighting, dreamy backgrounds, and a warm, nostalgic mood”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the 'soft pastel' and 'warm, nostalgic' mood
- + Captures a dreamlike aesthetic that fits more modern Ghibli styles
- + Maintains the characteristic expressions of the source image perfectly within an illustrative style
- − The image is a bit too blurry/hazy, losing some structural detail
- − The texture feels more like a digital filter than a hand-painted medium
Wan 2.6
- + Superior watercolor and hand-painted texture that evokes classic Ghibli backgrounds
- + Very high fidelity to the source image's composition and character details
- + Clearer linework and cleaner facial rendering
- − The color palette is a bit cooler and less 'nostalgic' than requested
- − Characters look slightly more Western-realistic than the Ghibli anime style usually dictates
Verdict: Both models did an excellent job of translating the 'Distracted Boyfriend' meme into an illustrative style. GPT Image 1.5 captured the warm, dreamy atmosphere of a Ghibli film better, while Wan 2.6 provided much more convincing hand-painted textures and watercolor effects that feel like authentic production art.
Golden Hour Stroll
Image Editing“Add dynamic motion to this photo: make hair blow in the wind, add leaves flying, energetic and lively feel.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the 'flying leaves' part of the prompt with high density.
- + Strong dynamic hair movement that looks natural for a windy day.
- + Perfectly preserves the source image subject and background.
- − The leaves appear somewhat flat and lack motion blur relative to their quantity.
- − Some leaves overlap the subjects in a slightly busy way.
Wan 2.6
- + Successfully adds wind-blown hair movement.
- + Maintains high fidelity to the original source image.
- + Subtle and clean integration of a few leaves.
- − The leaf count is very low, making the 'energetic and lively' feel less apparent.
- − The background remains static, lacking the full atmosphere requested.
Verdict: Both models did an excellent job of preserving the source image while adding the requested hair movement. GPT Image 1.5 is the clear winner for its commitment to the 'lively' atmosphere, adding a significant number of flying leaves that transform the mood of the photo, whereas Wan 2.6 was too conservative with the leaves.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
GPT Image 1.5
- + Perfect text rendering for both the name and the date banner.
- + Excellent vector emblem style with a professional, balanced layout.
- + Includes a stippled texture that perfectly matches the 'vintage' prompt.
- − The 'steam' element is a bit thick, looking more like a solid shape than vapor.
Wan 2.6
- + Good color palette following the brown and cream request.
- + Clean vector-style illustration of the cloche dome.
- + Background texture aligns well with the 'subtle texture' prompt.
- − The 'Est. 1720' banner is awkwardly placed and small.
- − The typography is much more generic compared to Model A.
- − The steam lines are slightly disconnected and thin.
Verdict: GPT Image 1.5 followed the prompt much more effectively, producing a cohesive logo with professional typography and a well-integrated 'Est. 1720' banner. Wan 2.6 produced a decent image, but the banner placement was awkward and the overall composition lacked the sophisticated 'vintage minimalist' feel requested.
Apollo 11: Journey to Tranquility
Text-to-Image“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”
AI Judge Analysis
GPT Image 1.5
- + Successfully included all six requested infographic steps in order.
- + Excellent text rendering for labels like 'LAUNCH', 'TRANSLUNAR', and astronaut names.
- + High-quality vector style and consistent iconography that matches the requested NASA palette.
- − The 'Descent' and 'Landing' modules are nearly identical in appearance.
- − Some minor overlapping of graphic elements in the Earth Orbit section.
Wan 2.6
- + Clean aesthetic with a clear NASA-inspired color palette.
- + Readable text for the title and astronaut names.
- − Completely failed to include the requested six-step infographic sequence.
- − Missing all requested icons (Saturn V, orbit rings, trajectory arc, lunar module).
- − Composition is mostly empty space with very little informational value.
Verdict: GPT Image 1.5 is the clear winner as it followed every detail of the complex prompt, creating a full six-step infographic with accurate vector icons and clear text. In contrast, Wan 2.6 failed to provide the infographic steps, offering only a minimalist poster with astronaut names that ignored the bulk of the instructional prompt.
GPT Image 1.5
OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts
Wan 2.6
Alibaba's multimodal generation model from the Wan AI suite, supporting text-to-video, image-to-video, reference-to-video with audio, and text-to-image, in both Chinese and English