Grok Imagine Video

AI Video Generation Model

Video #5 $$$ · 5¢

xAI's video generation model based on the Aurora architecture, supporting text-to-video, image-to-video, and video editing with native audio-visual synthesis at up to 720p

Grok Imagine Video Benchmarks

Grok Imagine Video is ranked #5 in Text-to-Video with an Elo of 1096 on the Lumenfall Arena, where real users pick the better image in blind comparisons. These rankings are based on 3 blind-vote competitions.

Lumenfall Arena
#5
Text-to-Video
1096 Elo

Text-to-Video Landscape

Elo vs Speed

2 models waiting for enough speed data

Competition Results

Uncategorized

#4
Neon Rain Reverie
6 models
Text-to-Video
Prompt

“Hyper-realistic cinematic video of an elegant young woman in a flowing white silk dress dancing gracefully in heavy pouring rain at night on a neon-lit Tokyo street. Her long wet hair whips dramatically in the wind, the dress clings and flows with realistic fabric and water physics, raindrops splash and create perfect reflections of pink and blue neon signs on the wet pavement. Subtle emotional expression of freedom mixed with melancholy on her face, water droplets on skin and eyelashes catching the light. Smooth dynamic orbiting camera with slight cinematic handheld feel, dramatic volumetric lighting with god rays piercing through the rain, photorealistic, 8K, film grain, shallow depth of field, anamorphic lens flare.”

#5
The Rubik's Gauntlet
6 models
Text-to-Video
Prompt

“Hyper-realistic cinematic close-up of a professional speedcuber solving a 3x3 Rubik's Cube at world-record pace. His hands move with insane precision and blistering speed — fingers flying across the glossy colored faces in a complex sequence of advanced algorithms, rapid twists, and smooth layer turns. The cube rotates with perfect realistic physics, slight motion blur on fast turns, and flawless color consistency as it progresses toward a solved state. Subtle sweat glistening on skin, visible veins, hyper-detailed fingerprints and nail textures. Intense focused facial expression with micro-expressions of concentration in shallow depth of field. Dramatic cinematic side lighting with strong specular highlights and reflections dancing across the cube surfaces and skin. Smooth slow orbiting camera that circles the hands and cube, capturing every intricate finger movement from dynamic angles. Photorealistic, 8K, subtle film grain, anamorphic lens flare, moody intense atmosphere, 24fps.”

#6
The Soul Gauntlet
6 models
Text-to-Video
Prompt

“Extreme cinematic close-up of a beautiful young woman experiencing deep, raw emotion. Her expression slowly shifts from quiet sorrow to intense cathartic crying — realistic skin texture with visible pores, subtle muscle twitches, glistening tears forming in her eyes and rolling down her cheeks, red-rimmed eyes with natural blinking and micro-expressions of pain and release. Soft dramatic side lighting with gentle rim light highlighting the tears, very shallow depth of field, slight emotional camera push-in during the emotional peak, photorealistic, 8K, intricate skin and eye details, filmic color grading, subtle film grain.”

Help rank Grok Imagine Video Pick the better image in blind matchups. Results update rankings in real time.
Start Voting