ARENA Leaderboard

See how AI image models stack up against each other. How it works

Which model creates the best videos from text?

Ranked by blind votes in side-by-side matchups. Voters watch the videos, not the model names.

Best AI Models for Text To Video

6 models ranked · Last update: June 20, 2026 1:34 AM

#	Model	Price	¢/img	Elo
1	Seedance 2.0 ByteDance	$$$$	18.1¢	1185
2	Sora 2 Pro OpenAI	$$$$	30¢	1180
3	Veo 3.1 Fast Google	$$$$	10¢	1136
4	Grok Imagine Video xAI	$$$	5¢	1120
5	Wan 2.6 Alibaba	$$	3¢	1112
6	P-Video PrunaAI	$$	2¢	1107

As of June 2026, Seedance 2.0 holds a narrow lead with 1185 Elo and a dominant 72.7% win rate, maintaining a slim 4-point margin over Sora 2 Pro (1181 Elo). While the top tier is characterized by premium pricing and slow generation speeds, Alibaba’s Wan 2.6 (1112 Elo) secures the fifth position at a significant discount, costing 83% less per generation than the top-ranked model. Despite the performance gap at the summit, the race for efficiency is tightening as P-Video (1110 Elo) delivers a top-six performance with the fastest relative speeds in the tier.

Elo vs Cost

Elo vs Speed

3 models waiting for enough speed data

Think a model is ranked wrong? Cast your vote

Challenges

Neon Rain Reverie Text-to-Video

Prompt

“Hyper-realistic cinematic video of an elegant young woman in a flowing white silk dress dancing gracefully in heavy pouring rain at night on a neon-lit Tokyo street. Her long wet hair whips dramatically in the wind, the dress clings and flows with realistic fabric and water physics, raindrops splash and create perfect reflections of pink and blue neon signs on the wet pavement. Subtle emotional expression of freedom mixed with melancholy on her face, water droplets on skin and eyelashes catching the light. Smooth dynamic orbiting camera with slight cinematic handheld feel, dramatic volumetric lighting with god rays piercing through the rain, photorealistic, 8K, film grain, shallow depth of field, anamorphic lens flare.”

Top 3

Seedance 2.0

Kling V3 Pro

Kling V3 Omni Pro

Bottom 3

Grok Imagine Video

P-Video

Sora 2 Pro

6 models competed

Vote on this challenge All results & rankings

The Will Smith Spaghetti Challenge Text-to-Video

Prompt

“A medium shot of Will Smith sitting at a cozy, dimly lit Italian restaurant, twirling a forkful of spaghetti with a playful grin on his face. The camera slowly dollies in as he takes a bite, capturing the rich, steaming sauce. Warm candlelight, cinematic lighting, hyperrealistic, 4K. Smooth motion, consistent face, detailed hands, natural spaghetti physics.”

Top 3

Kling V3 Omni Pro

Grok Imagine Video

Wan 2.6

4 models competed

Vote on this challenge All results & rankings

The Soul Gauntlet Text-to-Video

Prompt

“Extreme cinematic close-up of a beautiful young woman experiencing deep, raw emotion. Her expression slowly shifts from quiet sorrow to intense cathartic crying — realistic skin texture with visible pores, subtle muscle twitches, glistening tears forming in her eyes and rolling down her cheeks, red-rimmed eyes with natural blinking and micro-expressions of pain and release. Soft dramatic side lighting with gentle rim light highlighting the tears, very shallow depth of field, slight emotional camera push-in during the emotional peak, photorealistic, 8K, intricate skin and eye details, filmic color grading, subtle film grain.”

Top 3

Kling V3 Omni Pro

Kling V3 Pro

Veo 3.1 Fast

Bottom 3

Sora 2 Pro

P-Video

Grok Imagine Video

6 models competed

Vote on this challenge All results & rankings

The Rubik's Gauntlet Text-to-Video

Prompt

“Hyper-realistic cinematic close-up of a professional speedcuber solving a 3x3 Rubik's Cube at world-record pace. His hands move with insane precision and blistering speed — fingers flying across the glossy colored faces in a complex sequence of advanced algorithms, rapid twists, and smooth layer turns. The cube rotates with perfect realistic physics, slight motion blur on fast turns, and flawless color consistency as it progresses toward a solved state. Subtle sweat glistening on skin, visible veins, hyper-detailed fingerprints and nail textures. Intense focused facial expression with micro-expressions of concentration in shallow depth of field. Dramatic cinematic side lighting with strong specular highlights and reflections dancing across the cube surfaces and skin. Smooth slow orbiting camera that circles the hands and cube, capturing every intricate finger movement from dynamic angles. Photorealistic, 8K, subtle film grain, anamorphic lens flare, moody intense atmosphere, 24fps.”

Top 3

Sora 2 Pro

Kling V3 Omni Pro

Kling V3 Pro

Bottom 3

Seedance 2.0

Veo 3.1 Fast

Grok Imagine Video

6 models competed

Vote on this challenge All results & rankings

FAQ

What is the best AI text to video model?

Based on blind community voting, Seedance 2.0 is currently the #1 ranked AI text to video model with an Elo rating of 1185. Rankings update in real time as new votes come in.

How are AI text to video models ranked on Lumenfall?

Lumenfall Arena ranks AI models through blind community voting. In each matchup, two models generate from the same prompt and voters pick the better result without seeing model names. Votes are processed using TrueSkill, a Bayesian rating algorithm developed by Microsoft Research, that produces a single Elo score reflecting each model's relative quality.

What is an Elo rating for AI models?

An Elo rating is a numerical score representing a model's skill relative to other models. Under the hood, Lumenfall uses TrueSkill, which tracks two values per model: mu (estimated skill) and sigma (uncertainty). The displayed Elo is calculated as 1000 + 10 x (mu - 3*sigma), a conservative lower bound. A model must prove itself consistently across many matchups to earn a high rating.

Cast Your Vote

Vote Now