OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts
Settled by community votes across 16 shared challenges, with an AI judge weighing in on each.
GPT Image 1.5
#7 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Grok Imagine Image Pro
#14 of 44 in Text-to-Image
Where the votes landed
GPT Image 1.5
50.0%
win rate
Ties
9.1%
Grok Imagine Image Pro
40.9%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Capybara Taxi Driver
Text-to-Image“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”
AI Judge Analysis
GPT Image 1.5
- + Excellent texture on the capybara's fur and the taxi interior
- + Cinematic lighting that accurately reflects a night scene
- + Natural character interactions and expressions
- − The passenger is slightly out of focus
- − Composition is very tight and crops out the city environment
Grok Imagine Image Pro
- + Clearer view of the Manhattan street environment and taxi exterior
- + Highly detailed and readable text on the taxi cap
- + Captures the 'bored expression' of the passenger perfectly
- − The texture of the capybara's fur is slightly smooth and less realistic
- − The steering wheel appears to be emerging from the capybara's arm/jacket rather than being held
Verdict: GPT Image 1.5 offers a more photorealistic and intimate shot with superior lighting and textures, making the surreal scene feel more grounded. Grok Imagine Image Pro provides a better sense of scale and environment, with impressive text rendering on the cap, but falls slightly behind on the physical integration of the capybara with the car's controls.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the clothing reference (peacoat, scarf, jeans, sunglasses, watch)
- + Maintains the subject's face, hair, and vitiligo patterns with high accuracy
- + Natural lighting and shadows that match the beach setting
- − Slightly changes the background details like the wooden post texture
- − Introduces minor sunglasses artifacts where they meet the face
Grok Imagine Image Pro
- + Perfect preservation of the original background and pose
- + Subject's face and hair are nearly identical to the source image
- − Completely failed to use the outfit from Image 2, generating a generic royal costume instead
- − Skin tone on the newly generated hands does not match the subject's face and vitiligo
Verdict: GPT Image 1.5 successfully followed the complex instruction to transfer the specific outfit from Image 2 onto the person in Image 1, including accessories like the scarf and watch. In contrast, Grok Imagine Image Pro completely ignored the reference image for the clothing, creating an unrelated regal outfit and failing to maintain consistent skin tones on the hands.
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
GPT Image 1.5
- + Perfectly depicts the small blue sphere inside the cube.
- + Highly realistic glass reflections and refractions of the background plant.
- + Excellent lighting consistency from the left window.
- − The sphere is quite large relative to the cube, rather than 'small' as requested.
Grok Imagine Image Pro
- + Good composition with a more interesting plant choice (Monstera).
- + Follows the instruction for a 'small' sphere better than Model A.
- + Accurate placement of all requested elements.
- − The glass cube has a significant rendering error where a second sphere or reflection appears physically detached on the right.
- − The perspective of the cube's base and the table surface is slightly warped.
Verdict: GPT Image 1.5 produced a much more coherent and realistic image with physically accurate glass behavior and lighting. While Grok Imagine Image Pro attempted a better scale for the 'small' sphere, it suffered from a major glitch inside the cube that looks like a duplicated object and has less convincing glass textures.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
GPT Image 1.5
- + Excellent surface textures on the lunar ground and space suit
- + High cinematic detail in the background planets and nebula
- − Completely failed the negative constraint; the astronaut is riding on top of the horse
- − The anatomy of the horse's front legs is messy and fused with dust
Grok Imagine Image Pro
- + Successfully followed the specific spatial instruction for the horse to be on top
- + Vibrant colors and a more surreal composition involving a pegasus
- − The astronaut is floating underneath rather than being 'ridden' by the horse in a literal sense
- − The scale feels slightly flat compared to the depth in Model A
Verdict: GPT Image 1.5 failed the primary challenge of the prompt, producing a standard 'astronaut riding a horse' image despite the 'not vice versa' instruction. Grok Imagine Image Pro correctly interpreted the spatial requirement by placing the horse above the astronaut, meeting the surreal nature of the request more effectively.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
GPT Image 1.5
- + Excellent depiction of light rain with visible droplets on the jacket and bicycle
- + Very convincing motion blur from the passing car as requested
- + Effective 'imperfect framing' that enhances the candid, street-photography aesthetic
- − The rear bicycle wheel has some structural clipping and AI artifacts near the gear system
Grok Imagine Image Pro
- + Natural and detailed skin texture on the man's face and hands
- + Clear reflections in the puddles on the pavement
- + Accurate representation of a red Japanese-style utility bicycle
- − The 'motion blur' on the cars in the background feels static or like light trails rather than a moving vehicle
- − The scene lacks the atmosphere of rain; the man and bike look almost entirely dry
Verdict: GPT Image 1.5 followed the atmospheric instructions much better, capturing the texture of light rain and the specific motion blur of a passing car, which gives it a more authentic 'candid' feel. While Grok Imagine Image Pro has nice skin details, it fails to make the subject actually appear as if he is standing in the rain, making the scene feel staged.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
GPT Image 1.5
- + Excellent depiction of skin texture, sweat, and subtle scarring.
- + Superior lighting effects with warm torchlight glints on the metal and hair.
- + Very high level of detail on the ornate engraving and leather textures.
- − The hair braids and beads are a bit messy and less distinct than in the other model.
- − The composition is a bit more 'standard' fantasy portraiture.
Grok Imagine Image Pro
- + Impressive rendering of legible Latin text ('Lux in tenebris') on the gorget.
- + Unique hair styling with very clear beads and braided structure.
- + Great balance between the rugged character face and the highly ornate armor.
- − The skin texture appears slightly more smoothed and less 'battle-worn' than Model A.
- − The warmth of the torchlight is less integrated into the highlights of the face.
Verdict: GPT Image 1.5 wins on sheer photographic realism and the 'battle-worn' aesthetic, providing incredible detail in the skin, scars, and dynamic lighting. Grok Imagine Image Pro is also excellent, particularly for its ability to render legible text on the armor and distinct braided details, but it feels slightly more like a rendered game character compared to the lifelike quality of the GPT image.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
GPT Image 1.5
- + Perfectly rendered handwriting with very natural chalk texture and pressure variations.
- + Excellent adherence to the request for an elegant cursive title.
- + Highly realistic smudging and chalk dust on the board surface.
- − The board lacks a physical frame, making it feel slightly less like a physical object in a café.
- − Some letters in the smaller text at the bottom are slightly inconsistent in size.
Grok Imagine Image Pro
- + Clear and legible handwriting with good layout across the board.
- + Includes a realistic wooden frame and café background which adds to the requested atmosphere.
- + Successfully completed the truncated text 'Brown But...' with the logical 'Brown Butter Chocolate Chip Cookies'.
- − The handwriting looks slightly more 'digital' and perfect compared to the organic feel of Model A.
- − The title is in all-caps rather than the 'elegant cursive' specifically requested in the prompt.
- − The chalk texture is a bit grainy and uniform across all letters.
Verdict: GPT Image 1.5 followed the stylistic instructions more closely, particularly regarding the elegant cursive title and the hyper-realistic physical texture of chalk on slate. While Grok Imagine Image Pro provided a better overall composition by including the frame and café background, its failure to use cursive for the title and its slightly more mechanical-looking handwriting makes GPT Image 1.5 the winner for prompt adherence.
Pose & Character Mashup
Editing“Use Image 1 as the exact pose reference and Image 2 as the character reference. Recreate the person/character from Image 2 in the exact dynamic pose and body position from Image 1. Keep the exact face, hair, clothing style/details, and expression from Image 2. Match the lighting and environment of Image 1. The final image must show the character from Image 2 performing the precise action/pose from Image 1 with perfect anatomy and natural integration.”
AI Judge Analysis
GPT Image 1.5
- + Successfully applied the character's clothing, face, and accessories from Image 2.
- + Followed the background and lighting style from Image 1.
- + Captures the essence of the complex leg-crossing pose.
- − The torso orientation is upright rather than hunched/horizontal like Image 1.
- − The foot anatomy is distorted with too many toes.
- − The right arm placement is simplified rather than fully extended like the source.
Grok Imagine Image Pro
- + Maintains the exact technical body position and composition of Image 1.
- − Completely failed to use the character from Image 2.
- − Ignored the instruction to change the face, hair, and clothing.
- − Simply regenerated Image 1 with minor facial tweaks.
Verdict: GPT Image 1.5 successfully combined the two images by placing the character from Image 2 into the pose of Image 1, though it struggled with some anatomical details and precise torso angle. Grok Imagine Image Pro failed the core instruction, essentially mimicking Image 1 and ignoring the character reference (Image 2) entirely. GPT Image 1.5 is the preferred model for following the multi-image composition instructions.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
GPT Image 1.5
- + Excellent text rendering with clear item names, descriptions, and prices
- + Functional layout that balances text and imagery like a real menu
- + High-quality, appetizing food photography that looks professional
- − The 'Mains' text is slightly cut off at the bottom
- − Grid layout for photos is less uniform than requested
Grok Imagine Image Pro
- + Very clean and precise 3x3 photo grid
- + High-quality, vibrant food images with consistent lighting
- + Minimalist aesthetic follows the visual prompt closely
- − Completely fails to include the actual menu items, descriptions, or pricing
- − Impractical as a restaurant menu without text content
Verdict: GPT Image 1.5 produced a fully functional and professional restaurant menu with impeccable text rendering and a logical layout. Grok Imagine Image Pro interpreted the 'grid' instruction well but failed to include any of the text content required for a menu, resulting in just a collection of photos. GPT Image 1.5 is the clear winner for its superior prompt adherence and utility.
Bald man challenge
Image Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
GPT Image 1.5
- + Natural, curly hair texture that matches the beard style
- + Adds hair to the sideburn area seamlessly
- − Slightly alters the original facial features, making the face look a bit younger/smoother
- − Visible blending artifact near the top of the glasses frame
Grok Imagine Image Pro
- + Excellent preservation of the original facial features and skin texture
- + Perfectly maintains the background and clothing without any shifts
- + Realistic hairline and lighting integration
- − Hair looks slightly 'pasted on' on the far left side where it meets the background
Verdict: Grok Imagine Image Pro is the winner because it successfully added hair while perfectly preserving the identity, facial wrinkles, and glasses of the subject from the source image. GPT Image 1.5 provided a good result but noticeably altered the subject's face, making him look like a different person with similar features.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
GPT Image 1.5
- + Excellent depiction of fur texture and soft lighting
- + Tight, engaging composition that feels more intimate
- + Perfect adherence to the animal count and species requested
- − The butterfly scale is a bit large compared to the animals
Grok Imagine Image Pro
- + Dynamic poses showcasing movement and play
- + Beautiful wide landscape with good depth of field
- + Clean rendering of the meadow and flowers
- − Failed the prompt by including two kittens instead of one
- − The fox's tail and hind leg anatomy looks slightly awkward
Verdict: GPT Image 1.5 is the winner because it followed all instructions, including the specific count of animals, and produced a more cohesive and heartwarming masterpiece with 'ultra-detailed' fur. Grok Imagine Image Pro had a lovely landscape but failed the prompt by adding an extra kitten and had slightly less realistic fur textures.
Over-the-top cartoon caricature
Editing“Create a caricature of me and my job. Make it exaggerated and humorous, incorporating my profession as a tv show anchor and my love for dogs and hockey.”
AI Judge Analysis
GPT Image 1.5
- + Excellent character likeness preserved while applying the caricature style.
- + High visual quality with vibrant colors and professional studio lighting.
- + Clever integration of hobbies, especially the dog wearing a hockey helmet.
- − The hand holding the microphone has a slight anatomical issue with the thumb placement.
Grok Imagine Image Pro
- + Strong prompt adherence with many elements including a trophy and multiple dogs.
- + Good text rendering for 'Pups & Pucks'.
- + Effective caricature magnification of facial features.
- − Loses significant likeness of the woman in the source image.
- − Overall composition is a bit cluttered and chaotic compared to Model A.
- − The hockey stick held by the woman is thin and warped.
Verdict: GPT Image 1.5 is the clear winner because it manages to create a humorous caricature while still being instantly recognizable as the woman from the source image. Grok Imagine Image Pro includes more elements from the prompt but loses the specific facial identity of the subject, opting for a more generic caricature face.
Studio Ghibli Anime Style
Editing“Transform this photo into a Studio Ghibli–inspired illustration. Use soft pastel colors, hand-painted textures, gentle lighting, dreamy backgrounds, and a warm, nostalgic mood”
AI Judge Analysis
GPT Image 1.5
- + Excellent atmospheric lighting and glow consistent with Ghibli's 'dreamy' themes
- + Successfully captures the soft pastel palette requested in the prompt
- + Maintains the characteristic expressions and composition of the source image while stylizing them
- − The excessive glow makes some edges and details feel a bit too blurry
- − The textures feel more like digital 'noise' or sparkles than hand-painted watercolor
Grok Imagine Image Pro
- + Perfectly captures the hand-painted watercolor texture typical of Ghibli background art
- + Excellent preservation of the original image's character silhouettes and layout
- + Clean line work that mimics traditional cel animation
- − Colors are slightly less 'dreamy' or vibrant than requested
- − The lighting is flat compared to the requested 'gentle lighting' and nostalgic mood
Verdict: Both models did an excellent job transforming the 'Distracted Boyfriend' meme into a Ghibli style. GPT Image 1.5 leaned into the emotional atmosphere with warm lighting and soft focus, while Grok Imagine Image Pro excelled at recreating the physical medium of watercolor and ink. Grok Imagine Image Pro is the likely winner for its superior 'hand-painted' texture and cleaner preservation of the source image's identity.
Golden Hour Stroll
Image Editing“Add dynamic motion to this photo: make hair blow in the wind, add leaves flying, energetic and lively feel.”
AI Judge Analysis
GPT Image 1.5
- + Excellent hair motion effect that looks integrated with the person
- + Large number of leaves creates a strong sense of wind
- + High preservation of the original subjects' faces and clothing
- − Some leaves appear flat or like stickers overlaying the image
- − A few leaves have slightly unnatural coloration
Grok Imagine Image Pro
- + Leaves have more varied colors and natural shapes
- + Good hair motion that follows the wind direction
- + Maintains original image quality and details perfectly
- − Fewer leaves results in a slightly less 'energetic' feel than requested
- − A few leaves appear slightly blurry in the foreground
Verdict: Both models did an exceptional job of following the edit instructions while preserving the source image. GPT Image 1.5 feels more 'dynamic' and 'lively' because of the sheer volume of leaves, while Grok Imagine Image Pro feels a bit more natural but slightly more conservative with the effect. GPT Image 1.5 is the winner for more fully realizing the scale of the request.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
GPT Image 1.5
- + Excellent typography with era-appropriate flourishes.
- + Beautiful hand-drawn texture and lighting on the cloche.
- + High-quality vector emblem aesthetic with a balanced vertical layout.
- − Ignored the 'light background' instruction, opting for solid black.
- − The cloche is brown rather than a classic metallic look.
Grok Imagine Image Pro
- + Adhered perfectly to the light background and warm brown/cream color palette.
- + Excellent minimalist vector style that feels modern yet vintage.
- + Accurate rendering of the requested 'Est. 1720' banner and steam.
- − The steam element is a bit overly simplistic and disconnected.
- − The cloche is grey, which slightly clashes with the warm brown/cream theme.
Verdict: GPT Image 1.5 produces a more visually stunning and atmospheric logo with superior typography, but it completely fails to follow the negative space instruction for a light background. Grok Imagine Image Pro successfully follows every part of the prompt, including color scheme and background, delivering a clean and professional minimalist vector logo.
Apollo 11: Journey to Tranquility
Text-to-Image“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”
AI Judge Analysis
GPT Image 1.5
- + Excellent illustration quality with detailed but clean vector icons.
- + Highly legible, large typography for each step.
- + Followed the color palette and flat-vector style requirements perfectly.
- − The layout is a bit cramped at the top with overlapping elements.
- − The trajectory in 'Translunar' is a bit abstracted compared to the other literal steps.
Grok Imagine Image Pro
- + Perfectly balanced, professional infographic composition.
- + Flawless text rendering and numbering for all six steps.
- + Strict adherence to the 'consistent iconography' and 'NASA-inspired palette' instructions.
- − Icons are much smaller and less detailed than Model A.
- − The background is slightly plain, feeling more like a slide than a poster.
Verdict: Both models followed the prompt exceptionally well, capturing the NASA aesthetic and the six specific mission steps. Model A (GPT Image 1.5) features much stronger and more vibrant individual illustrations, while Model B (Grok Imagine Image Pro) excels in professional layout, spacing, and precise text rendering. Model B is chosen as the winner for its superior infographic structure which feels more like a finished, functional design.
Explore each model
xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model