GPT Image 1.5 vs Grok Imagine Image
Head-to-head across 17 challenges
GPT Image 1.5
43.9%
win rate
Ties
12.2%
Grok Imagine Image
43.9%
win rate
Challenge Results
Magic Burger Explosion: Fiery Photorealism Challenge
Text-to-Image“Ad for 'Magic Burger'. Dynamic, exploded burger with all components (bun, patty, cheese, lettuce, tomato, sauce) suspended in mid-air. Emphasize photorealistic detail and a sense of motion. Dark, fiery background with glowing embers. Integrate text: 'MAGIC BURGER' as a prominent title, 'LIMITED TIME ONLY' as a secondary message, and '€6.99' in a starburst, all rendered with a fiery, glowing effect.”
AI Judge Analysis
GPT Image 1.5
- + Excellent typography integrated naturally into the fiery theme
- + High-quality, realistic food textures on the bun and patty
- + Great sense of energy and chaos with the sparks and splashing sauce
- − The 'exploded' effect is a bit condensed compared to the vertical separation in Model B
Grok Imagine Image
- + Stronger 'exploded' composition with clear separation of ingredients
- + Very clean and readable price starburst
- + Vibrant colors and high contrast
- − The fire effects look somewhat like clip-art compared to the food
- − The lettuce and sauce droplets have a slightly plastic, CG appearance
Verdict: GPT Image 1.5 wins due to its superior photorealistic textures and masterful integration of text into the environment. While Grok Imagine Image has a better 'exploded' layout, its overall visual quality feels more like a digital illustration than a professional advertisement.
Geometric Composition
Text-to-Image“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”
AI Judge Analysis
GPT Image 1.5
- + Perfect adherence to all spatial requirements in the prompt.
- + Excellent rendering of materials, especially the glass texture and the book's fabric cover.
- + Very realistic lighting and reflections on the sphere and table.
- − The plant in the background is quite dense, making it slightly harder to see the 'partially visible' effect through the glass compared to Model B.
- − The cube is a bit wide, though still a cube shape.
Grok Imagine Image
- + Beautiful lighting and soft focus depth-of-field.
- + The plant is clearly visible through the glass as requested.
- + Good wooden table texture.
- − The blue sphere is floating inexplicably in the center of the cube, which feels physically unnatural.
- − The cube has a rectangular, vertical orientation rather than a standard cube shape.
- − The left edge of the glass cube looks slightly warped.
Verdict: GPT Image 1.5 is the superior choice because it captures the physical logic of the scene correctly, placing the sphere on the bottom surface of the cube rather than letting it float. While Grok Imagine Image has a very pleasing photographic quality and soft lighting, its failure to maintain a cube shape and the floating sphere make it less grounded and accurate to the prompt than GPT Image 1.5.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
GPT Image 1.5
- + Exceptional skin texture with realistic dirt and faint scarring.
- + Superior rendering of metal materials, showing realistic wear, grime, and torchlight reflections.
- + Accurate interpretation of 'close portrait' with high-fidelity detail on the braids and beads.
- − The bokeh sparks are very large and slightly distracting.
- − A bit of an anatomical glitch where a braid appears to merge into the shoulder armor.
Grok Imagine Image
- + Elegant engraving patterns on the armor plates.
- + Good use of actual torches in the background to justify the lighting.
- + Well-executed braids with clear beads as requested.
- − Skin texture is too smooth and 'plastic' for a battle-worn character.
- − The armor looks too clean and lacks the 'battle-worn' feel requested in the prompt.
- − The leather strap detail is less realistic compared to the other model.
Verdict: GPT Image 1.5 captures the 'battle-worn' aesthetic much more effectively with realistic skin texture, grime, and weathered armor. While Grok Imagine Image has beautiful engravings and clear prompt adherence for the beads and braids, it feels too much like a clean studio photoshoot and lacks the lifelike grit found in GPT Image 1.5.
Man and Car in California
Editing“Make a photo of the man driving the car down the California coastline”
AI Judge Analysis
GPT Image 1.5
- + Expertly preserves the likeness of the man from the source image, including his unique hairstyle and accessories.
- + Correctly maintains the specific interior and exterior details of the white Rolls-Royce Phantom Drophead Coupe.
- + Shows high source image preservation by keeping the man's scarf and coat details visible in the seat.
- − The man is sitting on the right side of the car, which for a North American coastline setting (California) would typically be the passenger side, though the car is right-hand drive.
- − The composition is heavily cropped, losing much of the car's body.
Grok Imagine Image
- + Captures the entire car in motion with a dynamic low-angle composition.
- + Accurately renders the California coastline background with realistic lighting and motion blur on the wheels.
- − Complete failure to preserve the man from the source image, replacing him with a generic older white man.
- − The car model was changed from a Phantom Drophead Coupe to a newer Rolls-Royce Dawn/Wraith variant.
- − Failed the primary objective of the image editing task by ignoring the provided subjects.
Verdict: GPT Image 1.5 successfully performed the edit by combining the two source images, maintaining the specific identity of the man and the exact model of the car. In contrast, Grok Imagine Image ignored the provided subjects entirely, generating a generic stock photo of a different man and a different car model, failing the fundamental requirement of an image editing task.
Candid Street Photography
Text-to-Image“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the request for 'imperfect framing' with a tight, grounded composition.
- + Very realistic skin textures and wet fabric details.
- + Bicycle anatomy is complex and reasonably high-quality for AI.
- − The motion blur on the car is minimal, appearing more like a static parked car.
- − The person's hands are slightly merged and messy upon close inspection.
Grok Imagine Image
- + Captured the 'motion blur from passing cars' perfectly, giving a true sense of a busy street.
- + Higher fidelity to the 'candid street photo' aesthetic with a more natural snapshot feel.
- + Includes authentic details like the face mask and Japanese signage.
- − The bicycle frame is physically impossible, with the down tube missing and the seat post floating.
- − Rain is less visible compared to Model A.
Verdict: GPT Image 1.5 produces a much more detailed and technically sound image in terms of the subject and the bicycle, featuring great textures. However, Grok Imagine much better captures the 'candid street' atmosphere and the specific 'motion blur' request, even if the bicycle's geometry falls apart. GPT Image 1.5 is the winner for overall visual coherence and following the 'no stylization' instruction without sacrificing structural integrity.
Modern Clean Menu
Text-to-Image“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”
AI Judge Analysis
GPT Image 1.5
- + Excellent text rendering with clear, legible fonts and descriptions.
- + Highly realistic and professional food photography that matches the menu items.
- + Very usable layout that feels like a real graphic design asset.
- − The grid is only on the right side rather than the whole layout being a grid.
- − Missing some requested sections like 'pizza' being placed under a shared header style but limited list size.
Grok Imagine Image
- + Creative integration of food photos scattered within the text layout.
- + Strong minimalist aesthetic with a good use of white space.
- + Follows the 'sections' prompt more strictly by including all requested titles.
- − Text is largely illegible gibberish, especially the small descriptions.
- − Repetitive menu items (e.g., 'Steak Frites' and 'Grilled Salmon' appear multiple times).
- − Food photos are small and lack the high-detail clarity seen in the competitor.
Verdict: GPT Image 1.5 produced a highly professional, production-ready menu with perfect text legibility and mouth-watering photography. Grok Imagine Image followed the structural layout of a full page better but failed significantly on text rendering, producing repetitive and nonsensical placeholder text that makes the menu unusable.
Bald man challenge
Editing“Give the person a full, thick head of natural hair with realistic texture, density, and a natural hairline. Preserve facial features and lighting.”
AI Judge Analysis
GPT Image 1.5
- + Excellent preservation of facial features and textures.
- + The hair texture and color match the existing beard perfectly.
- + Flawlessly maintains the background and clothing from the source image.
- − The hair volume is very high, which may slightly alter the head shape's silhouette compared to the original.
Grok Imagine Image
- + Good integration of a realistic hairstyle that fits the character's age.
- + Preserves the vast majority of the source image correctly.
- − Noticeable change to the facial features, particularly around the eyes and bridge of the nose.
- − The hair texture is slightly finer and lighter than the beard, creating a minor mismatch in appearance.
Verdict: GPT Image 1.5 is the clear winner as it successfully adds natural-looking hair that matches the beard perfectly while leaving the face and background untouched. Grok Imagine Image introduces subtle but noticeable changes to the man's facial features and glasses, failing to fully preserve the source identity.
Night Sky Transformation
Editing“Change the scene to night: a deep, dark sky with subtle, glistening stars visible behind the mountain.”
AI Judge Analysis
GPT Image 1.5
- + Excellent atmosphere with a deep, dark sky as requested.
- + Realistic reduction of light on the mountain slopes to match the night setting.
- + Added a convincing starry sky that feels integrated into the scene.
- − The town in the valley is significantly darkened, losing some of the 'night light' charm of the original.
- − Overall image is perhaps too dark, making some of the landscape details hard to see.
Grok Imagine Image
- + Higher preservation of the foreground town's brightness and detail.
- + Excellent source preservation of the mountain's shape and snow patterns.
- + Clear, distinct stars that fulfill the 'glistening' part of the prompt.
- − The lighting on the mountain faces feels a bit too bright for a midnight sky.
- − The transition between the mountain and the sky is slightly soft.
Verdict: Both models followed the instructions well, but Grok Imagine Image (Model B) achieved a better balance between the night sky and landscape visibility. While GPT Image 1.5 (Model A) created a more realistic 'deep dark' ambiance, it sacrificed too much detail in the town compared to Grok's version which maintained the original's warmth and composition while adding a beautiful night sky.
Over-the-top cartoon caricature
Editing“Create a caricature of me and my job. Make it exaggerated and humorous, incorporating my profession as a tv show anchor and my love for dogs and hockey.”
AI Judge Analysis
GPT Image 1.5
- + Excellent caricature style with exaggerated features that still maintain the subject's likeness.
- + Very detailed composition including a news desk, microphone, cameras, and multiple dogs.
- + Great integration of the hockey theme with a live game on the screen and a dog wearing a helmet.
- − The hands have some classic AI anatomical issues (varying finger counts and merging shapes).
Grok Imagine Image
- + Successfully captures all prompt elements: news anchor, dogs, and hockey.
- + Clean, professional-looking illustration style.
- + Humorous depiction of a dog ice skating in the background.
- − The facial features are less 'caricatured' and more like a standard bobblehead, losing some of the subject's personality.
- − The composition feels a bit flatter and more generic than Model A.
Verdict: GPT Image 1.5 wins this challenge by providing a much more expressive caricature that captures the subject's energy and smile more accurately than the competition. While Grok Imagine Image followed the prompt perfectly, its 'big head' style feels a bit more like a stiff template compared to the dynamic and richly detailed scene created by GPT Image 1.5.
Adorable Baby Animals in Sunny Meadow
Text-to-Image“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”
AI Judge Analysis
GPT Image 1.5
- + Excellent depiction of dynamic movement, with the kitten playfully tumbling on its back.
- + Detailed fur textures and lighting that feels integrated with the golden sunrise.
- + Realistic animal anatomy that captures the specific 'baby' proportions of all four species.
- − One of the kitten's front paws has an irregular number of digits/pads.
- − The background is quite busy with bokeh highlights that slightly distract from the subjects.
Grok Imagine Image
- + Strong implementation of the 'god rays' emanating from the sun.
- + Clean composition with animals framed nicely by tall wildflowers.
- + Vibrant color palette that emphasizes the 'wholesome' vibe.
- − The animals look more like stylized plush toys than 'hyper-photorealistic' creatures.
- − Missing the 'tumbling' and 'chasing' action requested in the prompt, as the animals are mostly sitting still.
- − The insects (butterflies/bees) are very small and lack the detail seen in Model A.
Verdict: GPT Image 1.5 followed the prompt much more effectively by capturing the 'tumbling' and 'chasing' action of the animals, whereas Grok Imagine Image produced a more static, posed portrait. Additionally, GPT Image 1.5 achieved a higher level of photorealism and texture detail, while Grok Imagine Image had a smoother, more artificial finish on the animals' fur and eyes.
Heroic Super Hero Portrait
Text-to-Image“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”
AI Judge Analysis
GPT Image 1.5
- + Excellent adherence to the 'modest' requirement with a traditional full-coverage tunic and boots.
- + Superior detailed urban background featuring recognizable New York architecture.
- + Very high resolution and skin texture detail, giving a more realistic photographic feel.
- − The hands/fingers on the hips have slight anatomical distortions.
- − The costume texture is a bit generic and consistent throughout.
Grok Imagine Image
- + Dynamic composition with a strong silhouette and dramatic cape flow.
- + Good lighting integration with the sunset reflecting off the suit's gold accents.
- + The character's expression and profile feel more focused and heroic.
- − The background is hazy and lacks the requested 'detailed urban cityscape' of New York.
- − The costume design is somewhat messy with redundant flaps and confusing red/blue patterns.
- − One hand is clenched while the other is on the hip, partially missing the 'hands on hips' instruction.
Verdict: GPT Image 1.5 is the preferred model because it accurately rendered the 'modeled' costume and specific 'detailed New York cityscape' requested in the prompt. While Grok Imagine Image has a more dramatic silhouette, its background is generic and the costume design lacks the clear, practical structure found in GPT Image 1.5.
Intricate Floral Mandala
Text-to-Image“Perfectly symmetrical mandala made entirely of real flowers, petals, leaves, fruits, and seeds in vibrant natural colors, intricate layered patterns with radial symmetry, top-down view on a soft neutral background, hyper-detailed organic textures and subtle shadows, photorealistic, 8K masterpiece.”
AI Judge Analysis
GPT Image 1.5
- + Displays near-perfect radial symmetry across every layer of the mandala.
- + Comprehensive inclusion of all requested elements: flowers, petals, leaves, fruits, and seeds.
- + Highly photorealistic textures with consistent lighting and subtle shadows that create a 3D effect.
- − The dense composition makes the outer edges feel slightly crowded against the frame.
- − The central sunflower is slightly less detailed compared to the outer concentric rings.
Grok Imagine Image
- + Striking, vibrant color palette with high contrast.
- + Good use of negative space in the layout, allowing individual elements to stand out.
- + Effective lighting and varied organic textures.
- − Fails the 'perfectly symmetrical' requirement, with many elements misaligned or asymmetrical in the corners.
- − The composition is cut off at the top, bottom, and sides, failing the top-down mandala layout.
- − Several elements appear duplicated or unnaturally smooth, losing the requested photorealism.
Verdict: GPT Image 1.5 followed the prompt meticulously, delivering a complete, perfectly symmetrical mandala with a huge variety of organic materials and consistent 8K-level detail. Grok Imagine produced a vibrant image but failed significantly on the core requirements of symmetry and containing the full mandala within the frame, resulting in many cropped and lopsided elements.
Neutral Expression to Genuine Smile
Editing{
"action": "image_edit",
"reference": "uploaded neutral portrait",
"change": "Warm genuine Duchenne smile: lips curved up, slight natural teeth, soft eye crinkles, subtle cheek raise",
"details": "Realistic smiling skin (dimples if present, soft cheek shadows), slightly brighter eyes; keep exact eye shape/color/iris",
"preserve_exact": "Face identity/structure, eyes/nose/lips/eyebrows, hair, skin texture/pores/freckles, makeup, clothing, head pose, background, lighting, shadows, framing",
"no_changes": "No face shape change, no new features, no gaze shift, no hair/clothing/lighting/background edits",
"style": "Ultra-photorealistic 8K portrait, sharp face focus, natural soft lighting, realistic skin glow"
}
AI Judge Analysis
GPT Image 1.5
- + Perfect preservation of hair strands and clothing texture from the original.
- + Excellent implementation of high-frequency skin details like freckles and pores.
- + More natural 'Duchenne' eye crinkles and genuine mouth shape.
Grok Imagine Image
- + Successfully applies a smile while maintaining overall facial structure.
- + Good lighting consistency with the original image.
- − Hair texture is noticeably smoothed out compared to the original.
- − The eyes were altered slightly, losing the specific squint requested.
- − A small artifact or mole appeared on the right side of the face that wasn't in the original.
Verdict: GPT Image 1.5 achieved a much higher level of source preservation, maintaining every individual hair strand and skin freckle from the neutral portrait. Grok Imagine Image applied a decent smile but noticeably smoothed the hair and skin textures, failing the 'preserve exact' constraint in the fine details.
Studio Ghibli Anime Style
Editing“Transform this photo into a Studio Ghibli–inspired illustration. Use soft pastel colors, hand-painted textures, gentle lighting, dreamy backgrounds, and a warm, nostalgic mood”
AI Judge Analysis
GPT Image 1.5
- + Excellent soft pastel color palette and warm, nostalgic mood.
- + Captures the dreamy, glowy lighting often found in Ghibli films.
- + Successfully styles the background with a painterly, impressionistic feel.
- − The facial features of the woman in red are significantly altered from the original.
- − The character designs lean more towards generic modern anime than the specific Ghibli aesthetic.
- − Loss of some spatial depth due to the heavy texture overlay.
Grok Imagine Image
- + Better preservation of character likeness while translating to an illustrative style.
- + Excellent adherence to actual Studio Ghibli character design (features, eyes, hair shading).
- + Maintains clear structural elements of the original photo while applying the hand-painted texture.
- − The sky and lighting are a bit flatter than the 'dreamy' request specified.
- − The red truck in the background is a bit of a literal addition not present in the original.
Verdict: Both models did an excellent job of stylizing the famous meme. GPT Image 1.5 captured the requested 'dreamy' and 'warm' mood more effectively through its lighting, but Grok Imagine Image provided a much more accurate 'Studio Ghibli' character aesthetic while better preserving the physical traits of the people in the original photo. Grok is the winner for its superior balance of artistic transformation and source preservation.
Golden Hour Stroll
Editing“Add dynamic motion to this photo: make hair blow in the wind, add leaves flying, energetic and lively feel.”
AI Judge Analysis
GPT Image 1.5
- + Excellent hair motion that flows naturally from the model's head
- + Motion blur on the leaves enhances the sense of dynamic movement
- + Leaves match the green/yellow summer palette of the original background
- − The leaf placement feels a bit cluttered in the foreground
- − Slight change to the model's facial features compared to the source
Grok Imagine Image
- + Successfully adds blowing hair and flying leaves
- + Maintains better facial consistency with the source image woman
- + Adds motion to the dog's ears, which contributes to the 'energetic' feel
- − The leaves are an autumnal orange/brown which clashes with the lush green background
- − The hair edit has some transparency issues where the background is visible through solid strands
- − Leaves appear static and 'pasted on' without much motion blur
Verdict: GPT Image 1.5 is the winner because it better captures the 'dynamic motion' request with effective use of motion blur on the leaves and a very natural flow to the hair. While Grok Imagine Image does a great job of adding movement to the dog's ears and preserving the face, the choice of autumn leaves in a green summer setting feels incongruous, and the lack of blur makes the effect feel less energetic.
Vintage Cafe Logo
Text-to-Image“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”
AI Judge Analysis
GPT Image 1.5
- + Excellent typography with a classic, hand-lettered vintage feel.
- + Very accurate rendering of the 'Est. 1720' banner as requested.
- + High-quality vector aesthetic with appropriate stippled shading texture.
- − The steam effect is a bit simple compared to the detailed cloche dome.
Grok Imagine Image
- + Strong minimalist vector lines and clean composition.
- + Good use of subtle paper texture in the background.
- − Redundant text, repeating 'Est. 1720' twice.
- − The cloche includes an odd handle or spoon artifact protruding from the side.
- − Typography is a bit generic for a 'vintage' request.
Verdict: GPT Image 1.5 followed the prompt much more effectively, specifically concerning the request for a banner and unique vintage typography. Grok Imagine Image included redundant text and a strange visual artifact protruding from the side of the cloche, detracting from the logo's professional quality.
Apollo 11: Journey to Tranquility
Text-to-Image“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”
AI Judge Analysis
GPT Image 1.5
- + Excellent typography with clean, readable sans-serif fonts
- + Perfect adherence to the requested NASA-inspired color palette
- + Professional hierarchy with clear horizontal sectioning
- − Step 1 and 2 icons are combined/confused in the top section
- − The rocket in the first section doesn't match the Saturn V silhouette as closely as Model B
Grok Imagine Image
- + Accurate, distinct icons for all 6 requested steps
- + Clean, spaced-out layout that works well for an infographic
- + Incorporated a high-quality NASA logo and crew names correctly
- − Several typos in secondary text, such as '3rajcoory' and 'Transluiory'
- − The colors are a bit more saturated than the requested 'muted' palette
Verdict: GPT Image 1.5 produces a much more polished and professional design that looks like a real infographic, whereas Grok Imagine Image contains several distracting typos. While Grok Imagine Image followed the specific step-by-step numbering more accurately, the superior visual quality and clean text of GPT Image 1.5 make it the better choice.
GPT Image 1.5
OpenAI's state-of-the-art image generation model with better instruction following and adherence to prompts
Grok Imagine Image
An image generation model by xAI designed to generate highly aesthetic images from text descriptions.