FLUX.2 [flex] vs Grok Imagine Image

Head-to-head across 10 challenges

FLUX.2 [flex]

70.0%

win rate

Ties

10.0%

Grok Imagine Image

20.0%

win rate

70.0% 10.0% ties 20.0%

Challenge Results

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

FLUX.2 [flex]
Grok Imagine Image

AI Judge Analysis

FLUX.2 [flex]

  • + Perfectly adheres to all spatial prompts including object placement.
  • + Excellent rendering of light and soft shadows.
  • + Clean, modern aesthetic with high clarity.
  • The sphere is quite large, pushing the definition of 'small' in the prompt.

Grok Imagine Image

  • + Captures the 'small' aspect of the blue sphere more accurately.
  • + Highly realistic wood texture and light scattering in the glass.
  • + Natural-looking plant and depth of field.
  • The glass object is a rectangular prism rather than a cube.
  • The sphere appears to be floating unnaturally in the center without support.

Verdict: Both models followed the complex spatial instructions well. FLUX.2 [flex] produced a better 'cube' and superior lighting, while Grok Imagine Image provided a more realistic texture on the wooden table and better followed the 'small' descriptor for the sphere. FLUX.2 [flex] is the winner for better geometric accuracy regarding the cube and more cohesive composition.

Man and Car in California

Editing
Edit instruction

“Make a photo of the man driving the car down the California coastline”

Source
FLUX.2 [flex]
Grok Imagine Image
100% wins 0% ties 0% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Excellent preservation of the specific man's identity, including his hairstyle and clothing (visible plaid scarf).
  • + High accuracy in preserving the specific car model (Rolls-Royce Phantom Drophead Coupé) from the source image.
  • + Realistic motion blur on the road and wheels that enhances the sense of driving.
  • The man's scale and positioning in the driver's seat feel slightly off, appearing a bit small for the car.
  • The steering wheel placement is slightly detached from the driver's grip.

Grok Imagine Image

  • + Beautifully rendered California coastline background with great atmospheric perspective.
  • + The composition of the car on the road feels very dynamic and professional.
  • Fails to use the man from the source image, replacing him with a generic older white man.
  • Changes the car model slightly, evolving the classic Phantom front into a more modern Rolls-Royce Dawn style.
  • Does not follow the multi-image prompt requirement to combine both source images.

Verdict: FLUX.2 [flex] is the clear winner because it successfully followed the core instruction: combining the specific man and the specific car from the source images into the requested new setting. While Grok Imagine Image produced a high-quality visual, it completely ignored the subject's identity, replacing him with a random character, which defeats the purpose of an image editing task.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

FLUX.2 [flex]
Grok Imagine Image

AI Judge Analysis

FLUX.2 [flex]

  • + Excellent adherence to technical photography prompts like 50mm shallow depth of field.
  • + Detailed skin textures and realistic rain/wet pavement interaction.
  • + Dynamic composition with effective use of bokeh and light.
  • The bicycle frame geometry is slightly warped/nonsensical near the crank set.
  • The scale of the bicycle seems a bit small relative to the man.

Grok Imagine Image

  • + Achieves a highly authentic 'street photography' look with realistic motion blur from cars.
  • + The bicycle design is more structurally plausible for a real-world bike.
  • + Captures the 'imperfect framing' prompt well with its candid feel.
  • The subject's face is obscured and partially covered by a mask, losing the 'elderly Japanese man' facial detail requested.
  • Overall lighting is a bit flatter and less 'cinematic' than technically requested.

Verdict: FLUX.2 [flex] produced a more visually striking and detailed image that followed the technical lighting and texture prompts more closely, though the bicycle's anatomy is slightly glitched. Grok Imagine Image captured the candid 'street photo' vibe and the motion blur of passing cars much more realistically, but it failed to showcase the facial details of the subject. FLUX.2 [flex] is the winner for its superior clarity and beautiful rendering of light and rain.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

FLUX.2 [flex]
Grok Imagine Image
0% wins 0% ties 100% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Strict adherence to the 3x2 grid layout for food photos.
  • + Clean, professional typography that is highly legible.
  • + Excellent use of color-coded headers for organization.
  • Text content is mostly gibberish.
  • Limited variety in food images, with some repetition.

Grok Imagine Image

  • + High accuracy in text rendering, including recognizable dish names like 'Bruschetta' and 'Margherita'.
  • + Dynamic and visually appealing layout with organic placement of food photos.
  • + Impressive variety in the types of food depicted.
  • Failed to follow the 'grid' requirement for photo placement.
  • Contains several duplicate item entries (e.g., multiple 'Steak Frites' and 'Grilled Salmon').

Verdict: FLUX.2 [flex] adhered much better to the specific layout requirements, providing a clean grid and clear sectioning, though the text is nonsensical. Grok Imagine Image produced much more legible and accurate text for a menu, but failed to follow the grid layout instruction and had several repetitive list entries.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

FLUX.2 [flex]
Grok Imagine Image
50% wins 0% ties 50% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Clean, professional typography and layout.
  • + Excellent miniature 3D aesthetic with soft, clay-like textures.
  • + Accurate isometric perspective and centered composition.
  • The flag is placed below the text, whereas the prompt suggested text at top-center and sushi below it (though the layout is still very pleasing).

Grok Imagine Image

  • + Good adherence to the isometric diorama style.
  • + Correct inclusion of all requested elements (flag, text, sushi).
  • + High visual clarity with sharp shadows.
  • The text rendering is slightly less refined than Model A.
  • Lighting is a bit harsh compared to the 'gentle lighting' requested.

Verdict: Both models followed the prompt exceptionally well, capturing the isometric miniature style. FLUX.2 [flex] produced a more aesthetically pleasing image with superior 'soft refined textures' and better typography, while Grok Imagine Image provided a more complex sushi plate that also accurately followed the design requirements. FLUX.2 [flex] is the winner for its more professional, clean, and cohesive 3D render look.

Night Sky Transformation

Editing
Edit instruction

“Change the scene to night: a deep, dark sky with subtle, glistening stars visible behind the mountain.”

Before After
FLUX.2 [flex]
Before After
Grok Imagine Image

AI Judge Analysis

FLUX.2 [flex]

  • + Perfect preservation of the original image details and town layout.
  • + Effective day-to-night lighting transition on the mountainside.
  • + Creative addition of a moon centered behind the peak.
  • Very few stars are visible despite the prompt asking for them.
  • The moon glow is a bit heavy-handed, obscuring the stars in the upper sky.

Grok Imagine Image

  • + Excellent depiction of 'subtle, glistening stars' as requested.
  • + Preserves the composition and structural details of the source image perfectly.
  • + Very realistic night sky texture.
  • Misses the opportunity to add a focal lighting element like the moon seen in Model A (though not explicitly requested).

Verdict: Both models did an exceptional job of preserving the source image while applying the night-time edit. FLUX.2 [flex] added a striking moon effect behind the Matterhorn, but failed to include the glistening stars requested in the prompt. Grok Imagine followed the prompt more accurately by filling the sky with stars while maintaining the exact details of the village and terrain, making it the more faithful edit.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

FLUX.2 [flex]
Grok Imagine Image
100% wins 0% ties 0% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Excellent adherence to the 'chasing butterflies' and 'tumbling' motion aspects of the prompt.
  • + Highly realistic fur textures and anatomical proportions for all four animals.
  • + Beautifully rendered god rays and dew sparkles that feel integrated into the scene.
  • The fox kit has slightly unusual dark legs that look more like black paws than a typical fox kit's markings.

Grok Imagine Image

  • + Warm, vibrant color palette with strong backlighting.
  • + Cute, stylized 'expressive eyes' as requested.
  • The animals are static and posing rather than 'playfully chasing and tumbling' as requested.
  • The butterfly rendering is poor, appearing as small white blobs rather than detailed butterflies.
  • The fur texture looks overly smooth and 'AI-processed' compared to the 8K masterpiece request.

Verdict: FLUX.2 [flex] is the clear winner as it successfully captures the dynamic action of the animals chasing butterflies in a realistic meadow, whereas Grok Imagine produced a static, posed shot. FLUX.2 [flex] also delivered much higher detail in the fur, background elements, and the butterflies themselves, whereas Grok Imagine struggled with the butterfly details and overall realism.

Heroic Super Hero Portrait

Text-to-Image

“Hyper-photorealistic full-body portrait of a female superhero standing triumphantly on a New York skyscraper rooftop at golden sunset, wearing a classic modest superhero costume with flowing cape, chest emblem, gloves, and boots in red and blue colors, practical design, short hair, strong determined heroic expression looking into the distance, powerful confident stance with hands on hips and cape billowing dramatically in the wind, detailed urban cityscape background, warm natural sunlight with sharp shadows and fabric highlights, ultra-sharp textures on suit, hair, and concrete, 8K masterpiece, empowering family-friendly style.”

FLUX.2 [flex]
Grok Imagine Image
50% wins 50% ties 0% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Excellent full-body composition with high clarity
  • + Highly detailed and realistic urban background of New York
  • + Realistic costume textures and natural lighting
  • The face has a slightly generic AI-beauty look
  • The chest emblem is generic compared to Model B's recognizable iconography

Grok Imagine Image

  • + Stronger dramatic lighting with the sun positioned directly behind the subject
  • + Bold cape physics and more iconic character design
  • + Slightly more realistic skin and facial structure
  • The city background is very blurry and lacks the 'detailed urban cityscape' requested
  • One hand is in a fist instead of 'hands on hips' as requested
  • The hand on the hip has anatomical issues with finger placement

Verdict: FLUX.2 [flex] provides a much better realization of the environment, capturing a detailed and recognizable New York City skyline that perfectly matches the prompt. While Grok Imagine Image has more dramatic lighting and a more iconic feel, it fails to deliver the detailed background requested and has noticeable anatomical flaws in the hands.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

FLUX.2 [flex]
Grok Imagine Image

AI Judge Analysis

FLUX.2 [flex]

  • + Perfect adherence to the 'vintage minimalist' and 'vector emblem' descriptors.
  • + Clean, professional typography that captures a classic Italian cafe aesthetic.
  • + Excellent layout balance with the arched text and banner.
  • The texture on the background is very subtle, almost unnoticeable.
  • The steam icons are slightly thin compared to the rest of the stroke weights.

Grok Imagine Image

  • + Good use of color depth and shading within the cloche icon.
  • + Includes a nice paper-like texture on the light background.
  • + Clear, legible text rendering.
  • Redundant text repeating 'Est. 1720' twice in the layout.
  • The cloche icon has nonsensical additions that look like a spoon and a handle merging into the dome.
  • Less 'minimalist' than requested, feeling more like a modern mascot logo.

Verdict: FLUX.2 [flex] is the clear winner as it perfectly captures the 'minimalist' and 'vector emblem' style requested, producing a clean and professional logo. Grok Imagine Image fails on the minimalist aspect and introduces visual incoherence with strange artifacts protruding from the cloche, as well as repeating the establishment date unnecessarily.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

FLUX.2 [flex]
Grok Imagine Image
100% wins 0% ties 0% wins

AI Judge Analysis

FLUX.2 [flex]

  • + Excellent layout with clean, professional vector aesthetic
  • + High text legibility and mostly correct spelling
  • + Sophisticated iconography that feels like a real educational poster
  • Missed the final 'Landing' step from the specific prompt list
  • The order of steps (reading vertically vs horizontally) is slightly unconventional

Grok Imagine Image

  • + Successfully included all 6 numbered steps plus a crew section
  • + Very accurate adherence to the specific iconography requests for each step
  • + Creative composition that uses the bottom of the frame as the lunar surface
  • Multiple spelling errors in the text (e.g., '3rajoory', 'Transluiory', 'Moom')
  • Icon for the Saturn V looks less like the actual rocket compared to Model A

Verdict: FLUX.2 [flex] produced a much more professional and aesthetically pleasing infographic that looks like a finished product, though it missed the final step of the requested list. Grok Imagine followed the prompt's structural instructions more closely by including all six steps and specific icons, but it suffers from poor text rendering and slightly less refined vector art. FLUX.2 [flex] is the preferred choice for its superior visual quality and clean execution.

FLUX.2 [flex]

Black Forest Labs' precision image generation model with maximum control, reliable text rendering, and complete creative control supporting up to 4MP output

Grok Imagine Image

An image generation model by xAI designed to generate highly aesthetic images from text descriptions.