GPT Image 1 Mini vs Grok Imagine Image

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

GPT Image 1 Mini

Grok Imagine Image

0% wins 0% ties 100% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent adherence to the glass cube geometry with clear, defined edges.
+ Very high-quality texture on the book and wooden table.
+ Perfectly captured the soft window lighting from the left.

− The blue sphere is quite large, whereas the prompt asked for a 'small' one.
− The plant is more to the side than strictly 'behind' the cube.

Grok Imagine Image

+ Accurately depicted a 'small' blue sphere as requested.
+ The plant is positioned directly behind the cube according to the prompt.
+ The lighting and shadows on the table are very realistic.

− The 'glass cube' appears more like a rectangular prism or tall block than a cube.
− The edges of the book and glass contain some minor artifacts where they meet.

Verdict: Both models followed the complex spatial instructions well. GPT Image 1 Mini produced a more visually pleasing image with superior textures and a perfect cube, though it ignored the 'small' descriptor for the sphere. Grok Imagine followed the 'small' and 'behind' prompts more accurately, but the central object is not a cube and the overall composition feels slightly more cluttered. GPT Image 1 Mini is the winner for its superior aesthetic quality and structural accuracy.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

GPT Image 1 Mini

Grok Imagine Image

AI Judge Analysis

GPT Image 1 Mini

+ Excellent skin texture and natural facial features on the man.
+ Very realistic depiction of wet asphalt and rain reflections.
+ High visual quality with a convincing shallow depth of field.

− The white car in the background is static, missing the requested motion blur.
− The bike's kickstand and frame geometry are physically impossible/nonsense.

Grok Imagine Image

+ Perfectly captures the 'motion blur from passing cars' request.
+ The 'imperfect framing' feels much more authentic to a candid street photo.
+ Film-like aesthetics that match the 50mm lens and no-stylization request.

− The man's face is obscured and less detailed than in the other image.
− The bicycle spokes and frame details are a bit messy upon close inspection.

Verdict: GPT Image 1 Mini produces a more detailed and aesthetically pleasing portrait, but it fails to incorporate the requested motion blur for the cars. Grok Imagine captures the requested 'candid' and 'motion blur' elements much more effectively, resulting in a more convincing street photography look despite the lower detail on the subject's face.

Fantasy Warrior

Text-to-Image

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”

GPT Image 1 Mini

Grok Imagine Image

50% wins 0% ties 50% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent weathered texture on the plate armor engravings
+ Strong cinematic lighting consistent with torchlight
+ Realistic facial skin texture and natural eyes

− Failed to include the specific request for beads in the hair
− Braids look more like dreadlocks or matted hair than clean braids

Grok Imagine Image

+ Perfectly adhered to the request for beads in the braided hair
+ Intense, lifelike eyes with clear reflections
+ Excellent representation of the leather straps and cloth underlayer

− The torch flame in the background is a bit distracting and less out-of-focus than the bokeh sparks requested
− The dirt on the face looks slightly more like makeup or paint than natural battle grime

Verdict: Grok Imagine is the superior choice for this prompt as it successfully included almost every specific detail, including the hair beads and the leather/cloth underlayers which were largely missing or obscured in the other image. While GPT Image 1 Mini produced a very cinematic and gritty texture on the armor, it failed to render the specific decorative elements requested for the character's hair.

Chalkboard Menu

Text-to-Image

“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”

GPT Image 1 Mini

Grok Imagine Image

AI Judge Analysis

GPT Image 1 Mini

+ Excellent text legibility and accuracy
+ Uniform chalk texture across all characters
+ Balanced and centered composition

− Text looks slightly too clean, bordering on a digital font filtered to look like chalk
− Title is in print capitals rather than the requested 'elegant cursive'

Grok Imagine Image

+ Highly authentic chalk texture with realistic smudges and dust on the board
+ Handwriting has more natural variations in slant and pressure
+ Captured the 'elegant cursive' elements better in certain parts of the script

− Slight alignment issues with some letters and prices
− The background is slightly more distracting compared to the clean frame of Model A

Verdict: Both models followed the complex text prompt exceptionally well with zero spelling errors. GPT Image 1 Mini provides a cleaner, more legible result but feels slightly more artificial, whereas Grok Imagine delivers a much more convincing chalk-on-board texture with realistic smudging and human-like handwriting imperfections.

Pose & Character Mashup

Editing

Edit instruction

“Use Image 1 as the exact pose reference and Image 2 as the character reference. Recreate the person/character from Image 2 in the exact dynamic pose and body position from Image 1. Keep the exact face, hair, clothing style/details, and expression from Image 2. Match the lighting and environment of Image 1. The final image must show the character from Image 2 performing the precise action/pose from Image 1 with perfect anatomy and natural integration.”

Source

GPT Image 1 Mini

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1 Mini

+ Successfully integrated the character's clothing, face, and accessories.
+ Maintained the yellow background and red ottoman from Image 1.
+ Matched the lighting style of the environment.

− Failed to replicate the exact complex leg-crossing pose from Image 1.
− The character's body orientation is more upright than the requested dynamic lean.
− Anatomy of the foot on the ottoman is slightly distorted.

Grok Imagine Image

+ Maintained the high-quality resolution of the source image.

− Completely failed the edit instruction to change the character.
− Simply output a copy of Image 1 without any character features from Image 2.
− Zero adherence to the request to swap the person.

Verdict: GPT Image 1 Mini followed the complex multi-image instruction by successfully placing the character from Image 2 into the setting and style of Image 1, even if the exact leg-cross pose was slightly simplified. Grok Imagine Image failed the task entirely, returning an unedited version of Image 1 with no character modifications. GPT Image 1 Mini is the clear winner for actually performing the requested synthesis.

Outfit Transfer Challenge

Editing

Edit instruction

“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”

Source

GPT Image 1 Mini

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1 Mini

+ Excellent replication of the specific outfit layers (scarf, coat, belt, and watch).
+ High visual quality with realistic fabric rendering and lighting.

− Failed to preserve the original person's face, hair, and vitigilo patterns accurately.
− Changed the composition and background slightly instead of a pixel-perfect overlay.

Grok Imagine Image

+ Perfect preservation of the original person's face, hair, and specific skin vitiligo textures.
+ Kept the background and composition exactly as the source image.

− Completely ignored the clothing in Image 2, generating a generic 'elaborate' royal outfit instead.
− The hand integration with the new clothing is physically awkward.

Verdict: Both models failed the complex instruction in different ways. GPT Image 1 Mini was the only one that followed the instruction to use the specific outfit from Image 2, though it failed to preserve the base person's identity. Grok Imagine preserved the person perfectly but completely ignored the provided reference image for the clothing. GPT Image 1 Mini is a slightly better attempt at image-to-image editing, even though it struggled with identity preservation.

The Capybara Taxi Driver

Text-to-Image

“Photorealistic scene inside a yellow New York taxi at night. A capybara is driving, wearing a yellow taxi driver cap and a dark jacket. It has a calm, professional expression and both front paws on the steering wheel. In the back seat sits a human businesswoman in a coat, looking at her phone with a completely normal, bored expression (as if this is just another normal ride). Through the windows you can see the streets of Manhattan at night with blurred lights. Realistic taxi interior, photorealistic, detailed fur and fabric, 35mm lens, night lighting with reflections, shallow depth of field.”

GPT Image 1 Mini

Grok Imagine Image

AI Judge Analysis

GPT Image 1 Mini

+ Excellent fur texture and photorealistic lighting
+ Cinematic composition with a shallow depth of field
+ Accurate depiction of a professional 'taxi driver' cap and jacket

− The passenger is very blurry and slightly out of focus
− Only one paw is clearly on the steering wheel

Grok Imagine Image

+ Perfect adherence to showing both paws on the steering wheel
+ Very clear depiction of the passenger and her expression
+ Strong sense of setting with the visible Manhattan street through the windshield

− Compositional error places the passenger in the front seat instead of the back seat
− The capybara's paws look slightly like bird talons or sharp claws
− The cap looks more like a baseball cap than a traditional taxi driver hat

Verdict: GPT Image 1 Mini produces a much more realistic and cinematic image with superior lighting and texture. While Grok Imagine captures more of the specific prompt details like 'both paws', it fails the core compositional instruction by placing the passenger in the front seat, whereas GPT Image 1 Mini correctly places her in the back.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

GPT Image 1 Mini

Grok Imagine Image

100% wins 0% ties 0% wins

AI Judge Analysis

GPT Image 1 Mini

+ Natural, dynamic composition that captures the requested 'tumbling' and 'chasing' action.
+ Excellent anatomical realism for all four animal types.
+ Subtle, realistic lighting and soft fur textures that align with the hyper-photorealistic request.

− The god rays are a bit soft, though they are present.

Grok Imagine Image

+ Strong, dramatic god rays and vibrant golden hour lighting.
+ Lush flower variety in the foreground.

− Static, posed composition fails to capture the 'chasing' and 'tumbling' action requested.
− Artificial, doll-like appearance of the animals that borders on 'uncanny' rather than photorealistic.
− Insects look like generic white blobs rather than clear butterflies.

Verdict: GPT Image 1 Mini is the clear winner as it successfully captures the energy and movement of the animals described in the prompt while maintaining a high level of photorealism. Grok Imagine Image produces a more static, AI-stylized result with animals that look like figurines, and it ignores the 'chasing and tumbling' part of the prompt in favor of a portrait layout.

Challenge Results

Geometric Composition

AI Judge Analysis

Candid Street Photography

AI Judge Analysis

Fantasy Warrior

AI Judge Analysis

Chalkboard Menu

AI Judge Analysis

Pose & Character Mashup

AI Judge Analysis

Outfit Transfer Challenge

AI Judge Analysis

The Capybara Taxi Driver

AI Judge Analysis

Adorable Baby Animals in Sunny Meadow

AI Judge Analysis

GPT Image 1 Mini

Grok Imagine Image