OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts
Settled by community votes across 5 shared challenges, with an AI judge weighing in on each.
GPT Image 1.5
#7 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Wan 2.7
#34 of 44 in Text-to-Image
Where the votes landed
GPT Image 1.5
0%
win rate
Ties
0%
Wan 2.7
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 1.5
- + Text has a very realistic dry chalk texture with smudges.
- + The handwriting looks authentic and non-digital.
- + Perfectly follows all text prompts including completing the truncated 'Brown But...' item.
- − The composition is a tight crop showing only the board.
- − Light source is a bit uneven at the top.
Wan 2.7
- + Excellent environmental context with warm café lighting and background.
- + High contrast text that is very easy to read.
- + Accurately rendered all requested text including price details.
- − The text looks like a clean digital font rather than natural chalk.
- − The 'handwriting' is too uniform and lacks the grain/texture requested in the prompt.
Verdict: GPT Image 1.5 followed the stylistic requirements much better, producing incredibly realistic chalk textures and natural handwriting variations. Wan 2.1 created a more aesthetically pleasing scene with a better background, but the text feels like a digital overlay (font) rather than actual chalk on a board.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
GPT Image 1.5
- + Excellent dynamic lighting and highly detailed textures on both the horse and spacesuit.
- + Strong cinematic composition with realistic-looking dust and moon surface effects.
- + Great attention to detail in the horse's tack and anatomy.
- − Failed the specific spatial instruction; the astronaut is riding the horse instead of the horse being on top.
Wan 2.7
- + Clean, clear resolution with an interesting surrealist vibe in the background.
- + Good use of color and balance between the earth and the deep space elements.
- + Accurately represents an astronaut and horse set in space.
- − Failed the specific spatial instruction; the horse is being ridden by the astronaut.
- − Anatomical issues with the horse's rear right leg looking disconnected and rubbery.
Verdict: Both models failed the negative constraint/spatial instruction to have the 'horse on top', instead defaulting to the traditional 'astronaut riding horse' trope. GPT Image 1.5 is the superior image due to its 훨씬 richer textures, cinematic lighting, and more believeable anatomy compared to the flatter, more artificial look of Wan 2.7.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
GPT Image 1.5
- + Excellent transfer of the specific jacket and scarf pattern
- + Maintains high resolution and realistic fabric textures
- − Crop fails to show the full person, violating the instruction to keep background unchanged
- − Completely cut off the subject's face
- − Changed the lighting and color grade of the overall scene
Wan 2.7
- + Keeps the full subject, pose, and background intact
- + Preserves the subject's face and unique features perfectly
- + Understands the prompt as a full-body outfit replacement task
- − Failed to use the specific outfit from Image 2, generating a generic gold-embroidered coat instead
- − Lower visual fidelity compared to the source image
Verdict: GPT Image 1.5 successfully captured the specific clothing items from Image 2 but failed the core editing task by cropping out the subject's head and changing the image composition. Wan 2.7 followed all structural instructions regarding the subject's identity and pose but failed the visual reference task by replacing the requested outfit with a completely different style. Wan 2.7 is the likely winner because it actually performed the edit on the original person and scene, whereas GPT Image 1.5 basically generated a new image that threw away the source face and framing.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the prompt's specified viewing angle from inside the taxi.
- + Highly realistic texture on the capybara's fur and the taxi driver's cap.
- + Perfectly captures the 'bored' expression of the businesswoman in the background.
- − The capybara's paws look more like human-animal hybrid hands with dark fingers.
- − Slightly messy rendering of the taxi's dashboard in the foreground.
Wan 2.7
- + Strong cinematic lighting and sharp details on the exterior of the vehicle.
- + Captures the bored businesswoman effectively in the passenger seat.
- − Violates the prompt by placing the passenger in the front seat instead of the back.
- − The viewing angle is from outside the car, whereas the prompt requested a scene from 'inside'.
- − The capybara's fur texture appears stylistically illustrated/rendered rather than photorealistic.
Verdict: GPT Image 1.5 is the clear winner because it correctly follows the complex spatial instructions of the prompt, placing the passenger in the back and viewing the scene from inside the cabin. Wan 2.7 fails on composition by placing the passenger in the front seat and using an exterior camera angle, and its capybara looks significantly less realistic than the one generated by GPT Image 1.5.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
GPT Image 1.5
- + Excellent atmospheric lighting and texture that matches the 'vintage gothic' request
- + Perfect text rendering of all requested details without spelling errors
- + Strong cinematic composition with a central glowing focal point
- − The thorns and webs are a bit dense, making the border look slightly cluttered
Wan 2.7
- + Clean layout with clear separation between different elements
- + Added creative elements like the cauldron and crows that fit the theme
- + Followed all text instructions accurately including the scroll banner
- − The 'Est. 1847' text was not requested and feels out of place
- − The lighting is flat and more illustrative than 'cinematic'
- − The bright parchment creates a cartoonish rather than 'dark gothic' mood
Verdict: GPT Image 1.5 is the clear winner for its superior atmospheric rendering and adherence to the 'dark parchment' and 'moody' descriptors, creating a cohesive gothic aesthetic. While Wan 2.7 has accurate text, it feels more like a modern illustration than a vintage gothic poster, lacking the depth and cinematic lighting found in GPT Image 1.5.
Explore each model
Alibaba's Wan 2.7 image generation and editing model for text-to-image, reference-guided generation, and instruction-based image edits