Grok Imagine Image xAI Imagen 4.0 Ultra Generate 001 Google

Settled by community votes across 7 shared challenges, with an AI judge weighing in on each.

Grok Imagine Image

24.1 arena score

#19 of 44 in Text-to-Image

Skill signature

Not enough comparable category data

The chart appears once both models have ratings across at least three shared arena categories.

Imagen 4.0 Ultra Generate 001

22.3 arena score

#28 of 44 in Text-to-Image

Vote tally

Where the votes landed

Grok Imagine Image

83.3%

win rate

Ties

16.7%

Imagen 4.0 Ultra Generate 001

0.0%

win rate

83.3% 16.7% ties 0.0%

Shared challenges 7

Challenge by challenge

The strongest take from each model on every shared challenge, with the AI judge's read.

Geometric Composition

Text-to-Image

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, partially visible through the glass. Soft window light from the left.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent refractive properties in the glass cube
+ Highly realistic wood texture and lighting
+ Accurate depiction of a plant seen through glass

− The blue sphere appears to be floating mid-air inside a hollow glass frame rather than a solid cube

Imagen 4.0 Ultra Generate 001

+ Excellent text rendering on the book's spine
+ Clean, sharp geometry for the glass cube
+ Good adherence to the spatial arrangement requested

− The reflection/refraction of the blue sphere is physically inconsistent
− The glass cube's back edge disappears unnaturally into the plant

Verdict: Both models followed the complex spatial instructions well. Grok Imagine produced a more photographically realistic scene with superior textures and lighting, whereas Imagen 4.0 Ultra provided better detail on the book (text) but suffered from more noticeable optical inconsistencies in the glass. Grok Imagine is the winner for its more cohesive and realistic visual quality.

Candid Street Photography

Text-to-Image

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm lens, natural skin texture, imperfect framing, motion blur from passing cars, cinematic but realistic, no stylization.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent depiction of motion blur from passing cars
+ High authenticity in the 'candid' and 'imperfect framing' request
+ Realistic street lighting and reflections

− The subject's face is obscured and hidden by a mask
− The bicycle geometry is slightly warped near the handlebars

Imagen 4.0 Ultra Generate 001

+ Superb skin texture and facial detail
+ Clearer depiction of the repair activity
+ Strong adherence to the 'light rain' and 'elderly man' components

− Lacks the requested motion blur from passing cars
− The wet pavement effect looks more like scattered petals or noise than rain reflections
− The composition feels more like a portrait than a candid street photo

Verdict: Grok Imagine much better captured the 'candid' and 'motion blur' requirements of the prompt, resulting in a more cinematic and realistic street photography look. While Imagen 3 Research has superior facial detail and skin texture, it failed to incorporate the motion blur and the rain reflections look less convincing than the atmospheric wet pavement in Grok's version.

Modern Clean Menu

Text-to-Image

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif fonts, vibrant accents, clean professional layout for casual dining.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

AI Judge Analysis

Grok Imagine Image

+ Excellent typographic hierarchy with clear section headers.
+ Sophisticated use of negative space and scattered food elements.
+ Visual consistency in the food photography styles.

− Violates the 'grid' request by using a more organic, scattered layout.
− Repetitive placeholder text (e.g., 'Steak Frites' appears three times).

Imagen 4.0 Ultra Generate 001

+ Perfectly follows the 'grid' requirement from the prompt.
+ Each food item includes a price, enhancing the realism of the menu.
+ Clean, modern aesthetic with consistent color-coded tags.

− Text rendering is very poor with several illegible gibberish words.
− The 'Mains' section header is awkwardly placed and lacks the weight of other headers.

Verdict: Imagen 4.0 Ultra followed the structural prompt more closely by implementing a clear grid layout, but it suffered from significant text artifacts and illegible fonts. Grok Imagine failed the 'grid' instruction but produced a much more professional and aesthetically pleasing design that looks like a high-end printable menu with superior typography.

Isometric Miniature Diorama Scenes

Text-to-Image

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR materials, gentle lighting, on a small raised diorama base with minimal garnish and plate. Solid light blue background. At top-center: 'JAPAN' in large bold text, 'SUSHI' below it, small flag icon. Perfectly centered, ultra-clean, high-clarity, square format.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

50% wins 50% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Perfect text rendering and alignment.
+ Clean, bright, and modern 3D aesthetic.
+ Excellent adherence to the 'raised diorama base' and 'solid light blue background' instructions.

− The sushi variety is a bit repetitive compared to real sushi platters.
− Shadows are a bit harsh for 'gentle lighting'.

Imagen 4.0 Ultra Generate 001

+ Higher detail in the food textures (especially the rice and fish grain).
+ More creative and realistic sushi assortment including nigiri and gunkan.
+ The 'soft refined textures' are more apparent in the modeling.

− The 'SUSHI' text is slightly off-center and the flag is placed beside the text rather than below 'JAPAN'.
− The garnish is slightly more complex than the requested 'minimal garnish'.

Verdict: Both models followed the prompt exceptionally well, but Grok Imagine Image edges out a win for superior composition and typography. While Imagen 4.0 Ultra provided more detailed food modeling and variety, Grok Imagine Image delivered a cleaner, more perfectly centered 'ultra-clean' aesthetic that better matched the graphic design requirements of the prompt.

Adorable Baby Animals in Sunny Meadow

Text-to-Image

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ultra-detailed soft fur, playfully chasing butterflies and tumbling together in a lush wildflower meadow, warm golden sunrise light with god rays and dew sparkles, joyful wholesome vibe, 8K masterpiece.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

AI Judge Analysis

Grok Imagine Image

+ Captures the 'god rays' lighting effect very effectively with strong directional beams.
+ Features a very soft, photorealistic rendering style for the animal fur.
+ Good use of depth of field with blurred foreground flowers.

− Animals are mostly sitting still rather than 'playfully chasing' or 'tumbling' as requested.
− The butterflies are small and less prominent compared to the request.
− The kitten's markings look slightly more like a wildcat than a standard tabby kitten.

Imagen 4.0 Ultra Generate 001

+ Perfectly captures the 'playfully chasing' and 'tumbling' action from the prompt.
+ Includes many colorful, detailed butterflies that the animals are actively interacting with.
+ Excellent rendering of 'dew sparkles' on the grass in the foreground.

− The style leans slightly more towards 'digital art' or 'high-end 3D' rather than strict photorealism.
− The fox's front paws look a bit anatomically stiff.
− The lighting is bright but lacks the specific 'god ray' shafts present in the first image.

Verdict: Grok Imagine Image provides a more atmospheric and photorealistic lighting setup with beautiful god rays, but the animals are static and the 'chasing' action is missing. Imagen 4.0 Ultra Generate 001 much better adheres to the narrative elements of the prompt, showing the animals in motion and interacting with butterflies, even if the style feels slightly more like a 3D render than a real photo.

Vintage Cafe Logo

Text-to-Image

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cream tones, subtle texture on light background, vector emblem style.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Excellent, bold typography with correct accent usage on 'Caffè'.
+ Clean vector style with high contrast and effective use of negative space.
+ Captures a premium, modern-vintage restaurant aesthetic.

− Repeats the 'Est. 1720' text twice, which looks redundant.
− Includes a strange cup/handle-like shape protruding from the side of the cloche.

Imagen 4.0 Ultra Generate 001

+ Perfect adherence to the 'banner' requirement for the date.
+ Refined, minimalist line work that fits the 'vintage vector' prompt perfectly.
+ Better balanced composition with a more authentic minimalist feel.

− Text rendering is slightly lighter and less impactful than the other model.
− The steam is very thin, almost getting lost in the background.

Verdict: Imagen 4.0 Ultra is the winner because it followed the specific layout instructions more accurately, particularly the 'banner' for the establishment date which Grok Imagine simplified into a ribbon shape and then repeated. Imagen 4.0 Ultra also captured a more cohesive minimalist vintage aesthetic with its delicate line work and balanced proportions, whereas Grok Imagine included an unrequested and awkward handle-like artifact on the cloche.

Apollo 11: Journey to Tranquility

Text-to-Image

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vector style, crisp lines, consistent iconography, subtle gradients only. Steps (stop at landing): 1. Launch (Saturn Vicon) 2. Earth Orbit (Earth + orbit ring icon) 3. Translunar (trajectory arc icon) 4. Lunar Orbit (Moon + orbit ring icon) 5. Descent (lunar module descending icon) 6. Landing (lunar module on the surface icon) Small supporting elements (minimal text): • Crew strip: three silhouette icons with only last names: Armstrong, Aldrin, Collins. • Landing site marker: Moon pin labeled "Tranquility" only. Layout constraints: generous margins, large readable labels, clean background with subtle stars. Vector-only, print-poster look, high resolution.”

Grok Imagine Image

Imagen 4.0 Ultra Generate 001

100% wins 0% ties 0% wins

AI Judge Analysis

Grok Imagine Image

+ Successfully followed the specific 6-step sequence requested.
+ Correctly rendered most text and names (Armstrong, Aldrin, Collins).
+ Maintained a clean, modern flat-vector aesthetic with the requested color palette.

− Nonsense text artifacts in the middle section (e.g., '3rajoory').
− The Saturn V rocket design is a bit generic and stylized compared to real-world reference.

Imagen 4.0 Ultra Generate 001

+ Very professional layout with a central hub-and-spoke design.
+ Strong adherence to the requested NASA color palette and clean vector lines.

− Failed completely on the 6-step content requirement, using gibberish text instead.
− Included repetitive 'Apollo 1' and 'Apollo 11' labels that don't match the prompt's logical flow.
− Icons are abstract and do not clearly represent the specific mission phases requested.

Verdict: Grok Imagine followed the complex instructions much better, including all 6 specific steps of the mission and the names of the crew members with relatively high legibility. Imagen 4.0 Ultra produced a nice layout, but the text is almost entirely nonsensical gibberish and it failed to include the requested steps (Launch, Translunar, etc.) in a meaningful way.