OpenAI's state-of-the-art image generation model with arbitrary resolution up to 4K and strong instruction following
Settled by community votes across 7 shared challenges, with an AI judge weighing in on each.
GPT Image 2
#3 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Grok Imagine Image Pro
#14 of 44 in Text-to-Image
Where the votes landed
GPT Image 2
100.0%
win rate
Ties
0.0%
Grok Imagine Image Pro
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
GPT Image 2
- + Exceptional text rendering with perfect spelling and legibility.
- + Well-organized professional layout with consistent iconography and sections.
- + Accurate adherence to 'bold sans-serif fonts' and 'vibrant accents'.
- − The logo 'NOVA' has a slightly distressed texture that clashes slightly with the clean minimalist prompt.
Grok Imagine Image Pro
- + Strong image quality and lighting within the food photography.
- + Captures the basic grid layout and minimalist white background.
- − Failed to render legible item descriptions, containing many spelling errors and nonsensical text.
- − Layout feels repetitive as food descriptions for different pizzas are identical copies of each other.
- − Typography is thin and stylized rather than the 'bold sans-serif' requested.
Verdict: GPT Image 2 (Model A) is vastly superior as it produces a functional, professional-grade menu with perfect text and thoughtful graphic design elements. In contrast, Grok Imagine Image Pro (Model B) suffers from significant legibility issues and repetitive placeholder-style text that renders the design unusable for its intended purpose.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 2
- + Excellent adherence to the 'elegant cursive' title request.
- + High-quality chalk texture with realistic grain and smudging.
- + Perfect spelling and formatting for all menu items and prices.
- − The slant in the handwriting is very subtle, bordering on perfectly upright.
Grok Imagine Image Pro
- + Strong chalk texture and realistic varied line weights.
- + Good layout that fills the board space effectively.
- + Followed all text instructions with accurate spelling and pricing.
- − Failed to provide 'elegant cursive' for the title, using a block print style instead.
- − Handwriting style fluctuates significantly between items, losing the 'same handwriting' requirement.
Verdict: GPT Image 2 is the superior overall image because it successfully followed the stylistic requirement for an elegant cursive title while maintaining a consistent handwriting style throughout the board. Grok Imagine Image Pro failed to use cursive for the title and the handwriting styles between the listed items felt disjointed rather than originating from the same hand.
Pose & Character Mashup
Editing“Use Image 1 as the exact pose reference and Image 2 as the character reference. Recreate the person/character from Image 2 in the exact dynamic pose and body position from Image 1. Keep the exact face, hair, clothing style/details, and expression from Image 2. Match the lighting and environment of Image 1. The final image must show the character from Image 2 performing the precise action/pose from Image 1 with perfect anatomy and natural integration.”
AI Judge Analysis
GPT Image 2
- + Successfully replicates the specific character from Image 2 including face, hair, and accessories.
- + Matches the complex pose and background from Image 1 nearly perfectly.
- + Integrates clothing elements from both images effectively into the final composition.
- − The fingers on the upper right hand are distorted and unnatural.
- − There is some slight blurring where the hair meets the neck.
Grok Imagine Image Pro
- + Maintains high visual clarity and a clean aesthetic.
- + Preserves the original environment and lighting of Image 1.
- − Completely failed the character reference instruction, ignoring the person in Image 2.
- − The face generated does not match either the source images or the requested character.
- − Failed to incorporate any clothing details (scarf, sunglasses, black sweater) from Image 2.
Verdict: GPT Image 2 followed the complex instructions excellently, successfully transplanting the specific character from Image 2 into the difficult pose from Image 1 while maintaining his likeness and accessories. Grok Imagine Image Pro completely ignored the character reference, simply generating a generic woman in the original pose. GPT Image 2 is the clear winner for its high level of adherence to the multi-step editing prompt.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
GPT Image 2
- + Excellent adherence to the 'horse riding astronaut' concept with functional-looking saddlery.
- + Highly detailed textures on the spacesuit and horse fur.
- + Strong cinematic lighting and realistic lunar surface.
- − The astronaut's glove anatomy on the left hand is slightly distorted.
- − The composition is a bit tight, cropping the horse's head.
Grok Imagine Image Pro
- + Vibrant, colorful nebula background and planet for a more 'space' feel.
- + Clean, artistic composition with a sense of zero-gravity weightlessness.
- − Failed the specific prompt instruction of 'riding'; the horse is floating above the astronaut rather than being on top in a riding posture.
- − Anatomy of the horse's legs is awkward, and the astronaut's fingers are poorly defined.
Verdict: GPT Image 2 followed the complex spatial instructions perfectly, depicting a horse literally riding a saddled astronaut on the moon. Grok Imagine Image Pro produced a more vibrantly colored image, but failed the unique 'horse on top' riding instruction, instead showing two separate entities floating near each other.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
GPT Image 2
- + Successfully transferred the exact clothing items from Image 2.
- + Preserved the person's face, hair, and background with high fidelity.
- − The clothing texture is slightly smoother than the original pea coat.
Grok Imagine Image Pro
- + Preserved the face and background elements correctly.
- − Completely failed to use the clothing from Image 2.
- − Visual artifacts present on the hands and rings.
Verdict: GPT Image 2 followed the edit instructions perfectly, accurately transferring the pea coat, scarf, and jeans from the reference image onto the target person. Grok Imagine Image Pro failed the core task by generating a generic royal outfit instead of the specific clothing requested in Image 2.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
GPT Image 2
- + Excellent photorealistic texture on the capybara fur.
- + Effective use of depth of field and bokeh lighting.
- + Natural interaction with the steering wheel.
- − The passenger is heavily blurred and less distinct.
- − The composition is a bit tight, losing some city context.
Grok Imagine Image Pro
- + Strong composition showing both subjects and the city clearly.
- + Accurate rendering of taxi meters and interior details.
- + Captures the bored expression of the businesswoman perfectly.
- − The capybara's hands/paws have slightly unnatural long claws.
- − The lighting is somewhat flat and less cinematic than the competitor.
Verdict: Both models followed the complex prompt well, with GPT Image 2 offering superior fur textures and cinematic atmosphere. However, Grok Imagine Image Pro is preferred because it provides a better side-by-side view of the driver and passenger, clearer city background, and includes more specific taxi-related details.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
GPT Image 2
- + Excellent classic typography with ornate serif detailing.
- + Intricate hand-drawn texture that perfectly fits the vintage aesthetic.
- + High-quality vector emblem composition with sophisticated framing.
- − The 'FLORIAN' text is slightly less minimalist than requested.
Grok Imagine Image Pro
- + Successfully follows the minimalist requirement.
- + Clean, clear layout that is very legible.
- + Accurate text rendering for both the name and the date.
- − The 'Est. 1720' banner is very basic and lacks the requested vintage flair.
- − The steam effect is a bit too simple, feeling more like a modern clip-art.
Verdict: GPT Image 2 provides a much more convincing 'vintage' interpretation with rich texture and superior typography that feels authentic to the 1720 era requested. While Grok Imagine Image Pro is more 'minimalist', it lacks the artistic depth and high-end branding feel found in GPT's output, appearing more like a modern digital reconstruction.
Explore each model
xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model