“Hyper-realistic cinematic video of an elegant young woman in a flowing white silk dress dancing gracefully in heavy pouring rain at night on a neon-lit Tokyo street. Her long wet hair whips dramatically in the wind, the dress clings and flows with realistic fabric and water physics, raindrops splash and create perfect reflections of pink and blue neon signs on the wet pavement. Subtle emotional expression of freedom mixed with melancholy on her face, water droplets on skin and eyelashes catching the light. Smooth dynamic orbiting camera with slight cinematic handheld feel, dramatic volumetric lighting with god rays piercing through the rain, photorealistic, 8K, film grain, shallow depth of field, anamorphic lens flare.”
xAI's video generation model based on the Aurora architecture, supporting text-to-video, image-to-video, and video editing with native audio-visual synthesis at up to 720p
Grok Imagine Video Benchmarks
Grok Imagine Video is ranked #1 in Text-to-Video with an Elo of 1076 on the Lumenfall Arena, where real users pick the better image in blind comparisons. These rankings are based on 2 blind-vote competitions.
Text-to-Video Landscape
Elo vs Cost
Elo vs Speed
Speed data is still warming up
We only have enough recent requests for Grok Imagine Video (384ms average).
Competition Results
Uncategorized
“Hyper-realistic cinematic close-up of a professional speedcuber solving a 3x3 Rubik's Cube at world-record pace. His hands move with insane precision and blistering speed — fingers flying across the glossy colored faces in a complex sequence of advanced algorithms, rapid twists, and smooth layer turns. The cube rotates with perfect realistic physics, slight motion blur on fast turns, and flawless color consistency as it progresses toward a solved state. Subtle sweat glistening on skin, visible veins, hyper-detailed fingerprints and nail textures. Intense focused facial expression with micro-expressions of concentration in shallow depth of field. Dramatic cinematic side lighting with strong specular highlights and reflections dancing across the cube surfaces and skin. Smooth slow orbiting camera that circles the hands and cube, capturing every intricate finger movement from dynamic angles. Photorealistic, 8K, subtle film grain, anamorphic lens flare, moody intense atmosphere, 24fps.”