xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model
Settled by community votes across 1 shared challenge, with an AI judge weighing in on each.
Grok Imagine Image Pro
#14 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Medium
#41 of 44 in Text-to-Image
Where the votes landed
Grok Imagine Image Pro
100.0%
win rate
Ties
0.0%
Stable Diffusion 3.5 Medium
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Reversed Rodeo
Text-to-Image“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
AI Judge Analysis
Grok Imagine Image Pro
- + Excellent adherence to the 'horse on top' spatial instruction.
- + Vibrant and cinematic color palette with detailed nebulae.
- + Strong sense of motion and surreal composition.
Stable Diffusion 3.5 Medium
- + Natural-looking lighting on the horse and astronaut.
- + Clean rendering of the Earth and space environment.
- − Failed the core prompt instruction of 'horse on top', instead showing the astronaut riding the horse.
- − Significant anatomical issues with the horse's legs, which appear distorted and multiplied.
- − Less imaginative interpretation of the 'surreal' aspect.
Verdict: Grok Imagine Image Pro successfully followed the complex spatial instruction to place the horse on top of the astronaut, creating a truly surreal and cinematic image. In contrast, Stable Diffusion 3.5 Medium defaulted to a standard 'astronaut riding a horse' composition and suffered from notable anatomical artifacts in the horse's legs.
Explore each model
Stability AI's 2.5-billion parameter Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model optimized for consumer hardware, featuring improved image quality, typography, and complex prompt understanding