GPT Image 2 OpenAI Grok Imagine Image xAI

Settled by community votes across 7 shared challenges, with an AI judge weighing in on each.

GPT Image 2

28.2 arena score

#3 of 44 in Text-to-Image

Top 3 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Grok Imagine Image

24.1 arena score

#19 of 44 in Text-to-Image

Vote tally

Where the votes landed

GPT Image 2

100.0%

win rate

Ties

0.0%

Grok Imagine Image

0.0%

win rate

100.0% 0.0% ties 0.0%

Shared challenges 7

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

GPT Image 2

Grok Imagine Image

AI Judge Analysis

GPT Image 2

+ Exceptional text rendering with coherent descriptions and accurate spelling
+ Highly professional grid layout that mimics a real-world graphic design product
+ Clear categorization with functional pricing and contact information

− None significant, though it adheres strictly to a standard corporate style

Grok Imagine Image

+ Good use of vibrant food photography and negative space
+ Captures the vibrant and casual aesthetic requested in the prompt

− Garbled, unreadable placeholder text for descriptions
− Repetitive menu items like multiple 'Grilled Salmon' and 'Steak Frites' entries
− Poor image alignment and floating food elements

Verdict: GPT Image 2 (Model A) is vastly superior as it produces a fully functional, professional-grade menu with legible text and a logical layout. Grok Imagine (Model B) suffers from significant AI artifacts, including illegible text and repetitive content that makes the design unusable for its intended purpose.

Magic Burger Explosion: Fiery Photorealism Challenge

Text-to-Image

“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”

GPT Image 2

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 2

+ Excellent photorealistic texture on the burger patty and lettuce.
+ Perfect adherence to the fiery, glowing text effect requested.
+ Superior composition that creates a professional advertising layout.

− The 'LIMITED TIME ONLY' box is slightly cluttered with sparks, making it a bit busy.

Grok Imagine Image

+ Strong sense of movement with the splash effect of the sauces.
+ Clean, legible text for the secondary message.

− Lighting on the burger is inconsistent with the fiery background.
− The burger components look more like 3D assets than photorealistic food.
− The '€6.99' starburst lacks the fiery, glowing effect requested in the prompt.

Verdict: GPT Image 2 is much more successful, delivering a high-end commercial aesthetic with incredible textures and perfect adherence to the complex lighting and text requirements. In contrast, Grok Imagine Image feels more like clip-art with flatter lighting and fails to apply the requested fiery effect to all text elements.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

GPT Image 2

Grok Imagine Image

AI Judge Analysis

GPT Image 2

+ Excellent adherence to the elegant cursive requirement for the title.
+ Superior chalk texture with realistic Pressure and tapering in the strokes.
+ Stronger composition with better spacing and a more authentic wooden frame.

− Small text at the bottom is slightly less legible than the main body.

Grok Imagine Image

+ Perfect spelling of all menu items and additional text.
+ Very clear and legible handwriting style.
+ Good use of chalk smudges on the board for added realism.

− Failed the requirement for an 'elegant cursive' title, using print instead.
− The text layout feels a bit cramped toward the bottom of the board.
− The chalk texture looks slightly more digital and uniform compared to Model A.

Verdict: GPT Image 2 is the clear winner as it followed the specific stylistic instruction for an 'elegant cursive' title, whereas Grok Imagine used a standard print style. Additionally, GPT Image 2 captured a much more authentic chalk texture on the 'Today's Specials' heading, making it look genuinely hand-drawn.

Pose & Character Mashup

Editing

Edit instruction

“Use Image 1 as the exact pose reference and Image 2 as the character reference. Recreate the person/character from Image 2 in the exact dynamic pose and body position from Image 1. Keep the exact face, hair, clothing style/details, and expression from Image 2. Match the lighting and environment of Image 1. The final image must show the character from Image 2 performing the precise action/pose from Image 1 with perfect anatomy and natural integration.”

Source

GPT Image 2

Grok Imagine Image

AI Judge Analysis

GPT Image 2

+ Excellent adherence to the character reference, including face, sunglasses, scarf, and black clothing.
+ Perfectly recreates the complex dynamic pose from the reference image.
+ Matches the lighting and yellow studio background of the source image correctly.

− The fingers on the raised hand are distorted and lack detail.
− Some artifacts are visible where the scarf meets the clothing.

Grok Imagine Image

+ Perfectly preserves the original Image 1 without any changes.

− Completely failed to perform the edit requested.
− Did not incorporate any elements from the character reference (Image 2).

Verdict: GPT Image 2 (Model A) successfully followed the complex instruction to merge the character from Image 2 with the pose and environment of Image 1. Grok Imagine Image (Model B) failed the task entirely, simply returning the first source image without any modifications.

Outfit Transfer Challenge

Editing

Edit instruction

“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”

Source

GPT Image 2

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 2

+ Perfectly replicates the outfit from Image 2, including the specific scarf pattern and coat texture.
+ Retains the person's exact face and hair from Image 1 with high fidelity.
+ Seamlessly blends the new clothing into the lighting and pose of the original beach scene.

− None notable; it successfully completed all parts of the multi-image instruction.

Grok Imagine Image

+ Matches the person's face and hair from Image 1 well.
+ Maintains the background environment accurately.

− Completely ignored the clothing in Image 2, generating a generic 'elaborate' royal outfit instead.
− The added right hand contains anatomical glitches and does not match the person's skin patterns correctly.

Verdict: GPT Image 2 followed the complex multi-step instructions perfectly, accurately transferring the specific clothing from the reference image while keeping the subject's identity and background intact. Grok Imagine Image failed the task by ignoring the visual reference for the outfit and inventing a different style of clothing, while also introducing anatomical errors in the added hand.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

GPT Image 2

Grok Imagine Image

AI Judge Analysis

GPT Image 2

+ Excellent photographic texture and lighting within the car
+ Highly accurate capybara anatomy and 'professional' expression
+ Realistic use of depth of field focusing on the protagonist

− The passenger is slightly out of focus compared to the requested importance
− Only one paw is clearly visible on the steering wheel

Grok Imagine Image

+ Perfect adherence to showing both paws on the steering wheel
+ Very clear depiction of the bored human businesswoman in the same focal plane
+ Strong background composition that effectively screams New York City

− Physical layout error where the passenger appears to be in the front seat instead of the back seat
− The taxi light is strangely placed on the interior ceiling or low-profile roof

Verdict: GPT Image 2 (Model A) provides superior photorealism and character detail, capturing the 'professional' vibe of the capybara perfectly. However, Grok Imagine Image (Model B) followed specific instructions better regarding the paws and the passenger's expression, though it failed the spatial logic by placing the passenger in the front passenger seat.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

GPT Image 2

Grok Imagine Image

AI Judge Analysis

GPT Image 2

+ Excellent typography with perfect spelling and accent marks.
+ Highly detailed engraving style with excellent use of texture and shading.
+ Sophisticated composition that perfectly captures the vintage, high-end cafe aesthetic.

− The 'minimalist' instruction was interpreted more as 'vintage ornate' rather than simple vector.

Grok Imagine Image

+ Clean vector style lines which lean closer to a modern minimalist logo.
+ Accurate colors and text rendering.

− Includes redundant 'Est. 1720' text appearing twice in the logo.
− The cloche clutters the design by trying to integrate a coffee cup and spoon into its shape, which looks awkward.
− The steam is overly thick compared to the elegance of the typeface.

Verdict: GPT Image 2 (Model A) is the clear winner as it produces a professional, cohesive, and historically appropriate logo that perfectly matches the 'Caffè Florian' brand identity. While Grok Imagine (Model B) attempts a more minimalist vector approach, it suffers from redundant text and a cluttered, poorly integrated central icon.