GPT Image 1 Mini OpenAI Wan 2.7 Alibaba

Settled by community votes across 3 shared challenges, with an AI judge weighing in on each.

GPT Image 1 Mini

25.3 arena score

#12 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Wan 2.7

19.0 arena score

#34 of 44 in Text-to-Image

Vote tally

Where the votes landed

GPT Image 1 Mini

win rate

Ties

Wan 2.7

win rate

Shared challenges 3

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

GPT Image 1 Mini

Wan 2.7

AI Judge Analysis

GPT Image 1 Mini

+ Excellent chalk texture that feels authentic to the medium
+ Perfect rendering of all requested text with no spelling errors
+ Accurate representation of hand-sketched lettering with varying strokes

− The title is not in the requested 'elegant cursive' style
− The composition is a bit tight with the frame

Wan 2.7

+ Beautiful cafe atmosphere and lighting in the background
+ Captures an elegant cursive-inspired flare in the lettering
+ Excellent spelling and text layout

− The text looks more like a digital font or vector overlay than authentic chalk
− The drop shadows and perfectly clean edges break the illusion of a hand-drawn board

Verdict: GPT Image 1 Mini wins because it accurately captures the texture and feeling of real chalk on a board, adhering to the requirement for a 'realistic chalk handwriting style' without looking digital. While Wan 2.7 has a more appealing background and better cursive adherence for the title, the text appears as a clean digital font, failing the prompt's core stylistic constraint.

Outfit Transfer Challenge

Editing

Edit instruction

“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”

Source

GPT Image 1 Mini

Wan 2.7

AI Judge Analysis

GPT Image 1 Mini

+ Successfully replicates the specific outfit items (pea coat, plaid scarf, blue jeans) from Image 2.
+ Maintains consistent skin textures and lighting that matches the original beach scene.

− Significantly changes the model's facial features and bone structure, losing the identity from Image 1.
− Changes the background and perspective of the wooden structure.

Wan 2.7

+ Perfectly preserves the source person's face, hair, and the background environment from Image 1.
+ Correctly adapts the new clothing to the person's original pose and body shape.

− Fails to use the outfit from Image 2, instead generating a generic gothic/elaborate coat.
− The generated clothing looks slightly 'stuck on' with some flat shading.

Verdict: GPT Image 1 Mini followed the clothing instructions much better but failed the preservation task by completely changing the person's face and the environment. Wan 2.7 perfectly preserved the source person and background, which is the primary goal of an image edit, even though it hallucinated a different 'elaborate' outfit instead of the specific one shown in Image 2. Wan 2.7 is preferred for maintaining the subject's identity.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

GPT Image 1 Mini

Wan 2.7

AI Judge Analysis

GPT Image 1 Mini

+ Excellent photorealistic lighting and depth of field consistent with a night scene.
+ Captures the bored, nonchalant expression of the passenger perfectly.
+ High-quality fur texture and realistic garment integration on the capybara.

− The passenger's hands and phone are slightly blurry and less defined.
− Only one paw is clearly visible on the steering wheel.

Wan 2.7

+ Successfully places both paws on the steering wheel as requested.
+ Wide shot provides more context of the yellow taxi and Manhattan city background.
+ Clearer rendering of the passenger's face and phone.

− The passenger is seated in the front passenger seat instead of the back seat.
− The capybara's head looks poorly composited onto the body, with a noticeable seam at the neck.
− The lighting is overly bright and lacks the atmosphere of a night-time taxi ride.

Verdict: GPT Image 1 Mini is the superior image due to its high level of photorealism, atmospheric lighting, and accurate placement of the passenger in the back seat. While Wan 2.1 followed the instructions for steering wheel placement better, it failed the spatial requirement of the passenger's location and suffered from poor anatomical blending and unrealistic daytime-like lighting.