ARENA Leaderboard

See how AI image models stack up against each other. How it works

Your vote decides the leaderboard Pick the better video in blind matchups. Results update rankings in real time.
Start Voting

Which model creates the best videos from text?

Ranked by blind votes in side-by-side matchups. Voters watch the videos, not the model names.

Best AI Models for Text To Video

3 models ranked · Last update: May 5, 2026 5:41 PM
# Model Elo
1 Sora 2 Pro OpenAI 1187
2 P-Video PrunaAI 1115
3 Grok Imagine Video xAI 1075

As of May 2026, OpenAI’s Sora 2 Pro leads the leaderboard with an Elo of 1188 and a 46.4% win rate, maintaining a significant 73-point lead over its closest rival. While PrunaAI’s P-Video holds the second position with an 1115 Elo, xAI’s Grok Imagine Video ranks third with an 1053 Elo and a lower 17.6% win rate. Despite the performance gap at the top, P-Video offers a notable price-to-performance dynamic, rivaling the leader's quality at 1/15th the cost per generation.

Elo vs Speed

Speed data is still warming up

We only have enough recent requests for Grok Imagine Video (44.6s average).

2 models waiting for enough speed data

Challenges

Neon Rain Reverie Text-to-Video

Hyper-realistic cinematic video of an elegant young woman in a flowing white silk dress dancing gracefully in heavy pouring rain at night on a neon-lit Tokyo street. Her long wet hair whips dramatically in the wind, the dress clings and flows with realistic fabric and water physics, raindrops splash and create perfect reflections of pink and blue neon signs on the wet pavement. Subtle emotional expression of freedom mixed with melancholy on her face, water droplets on skin and eyelashes catching the light. Smooth dynamic orbiting camera with slight cinematic handheld feel, dramatic volumetric lighting with god rays piercing through the rain, photorealistic, 8K, film grain, shallow depth of field, anamorphic lens flare.
Top 3
Video by Kling V3 Pro
1

Kling V3 Pro

Video by Kling V3 Omni Pro
2

Kling V3 Omni Pro

Video by Grok Imagine Video
3

Grok Imagine Video

The Soul Gauntlet Text-to-Video

Extreme cinematic close-up of a beautiful young woman experiencing deep, raw emotion. Her expression slowly shifts from quiet sorrow to intense cathartic crying — realistic skin texture with visible pores, subtle muscle twitches, glistening tears forming in her eyes and rolling down her cheeks, red-rimmed eyes with natural blinking and micro-expressions of pain and release. Soft dramatic side lighting with gentle rim light highlighting the tears, very shallow depth of field, slight emotional camera push-in during the emotional peak, photorealistic, 8K, intricate skin and eye details, filmic color grading, subtle film grain.
Top 3
Video by Kling V3 Omni Pro
1

Kling V3 Omni Pro

Video by P-Video
2

P-Video

Video by Veo 3.1 Fast
3

Veo 3.1 Fast

The Rubik's Gauntlet Text-to-Video

Hyper-realistic cinematic close-up of a professional speedcuber solving a 3x3 Rubik's Cube at world-record pace. His hands move with insane precision and blistering speed — fingers flying across the glossy colored faces in a complex sequence of advanced algorithms, rapid twists, and smooth layer turns. The cube rotates with perfect realistic physics, slight motion blur on fast turns, and flawless color consistency as it progresses toward a solved state. Subtle sweat glistening on skin, visible veins, hyper-detailed fingerprints and nail textures. Intense focused facial expression with micro-expressions of concentration in shallow depth of field. Dramatic cinematic side lighting with strong specular highlights and reflections dancing across the cube surfaces and skin. Smooth slow orbiting camera that circles the hands and cube, capturing every intricate finger movement from dynamic angles. Photorealistic, 8K, subtle film grain, anamorphic lens flare, moody intense atmosphere, 24fps.
Top 3
Video by Sora 2 Pro
1

Sora 2 Pro

Video by Kling V3 Omni Pro
2

Kling V3 Omni Pro

Video by Kling V3 Pro
3

Kling V3 Pro

FAQ

What is the best AI text to video model?

Based on blind community voting, Sora 2 Pro is currently the #1 ranked AI text to video model with an Elo rating of 1187. Rankings update in real time as new votes come in.

How are AI text to video models ranked on Lumenfall?

Lumenfall Arena ranks AI models through blind community voting. In each matchup, two models generate from the same prompt and voters pick the better result without seeing model names. Votes are processed using TrueSkill, a Bayesian rating algorithm developed by Microsoft Research, that produces a single Elo score reflecting each model's relative quality.

What is an Elo rating for AI models?

An Elo rating is a numerical score representing a model's skill relative to other models. Under the hood, Lumenfall uses TrueSkill, which tracks two values per model: mu (estimated skill) and sigma (uncertainty). The displayed Elo is calculated as 1000 + 10 x (mu - 3*sigma), a conservative lower bound. A model must prove itself consistently across many matchups to earn a high rating.

Cast Your Vote