What is the best AI text-to-video model?

Based on blind community voting, Sora 2 Pro is currently the #1 ranked AI text-to-video model with an Elo rating of 1249. Rankings update in real time as new votes come in.

How are AI text-to-video models ranked on Lumenfall?

Lumenfall Arena ranks AI models through blind community voting. In each matchup, two models generate from the same prompt and voters pick the better result without seeing model names. Votes are processed using TrueSkill, a Bayesian rating algorithm developed by Microsoft Research, that produces a single Elo score reflecting each model's relative quality.

What is an Elo rating for AI models?

An Elo rating is a numerical score representing a model's skill relative to other models. Under the hood, Lumenfall uses TrueSkill, which tracks two values per model: mu (estimated skill) and sigma (uncertainty). The displayed Elo is calculated as 1000 + 10 x (mu - 3*sigma), a conservative lower bound. A model must prove itself consistently across many matchups to earn a high rating.

Best AI Models for Temporal Consistency

Temporal Consistency · Text-to-Video

Elo rankings from blind votes across 2 challenges in this category.

Highlights

8 images

Price · Performance

Best models, by

Every ranked model plotted by price against its Elo. The upper-left frontier is the best value: top quality at the lowest cost.

Best Text-to-Video Models by Price

Best AI Models for Temporal Consistency

5 models ranked · Last update: June 26, 2026 2:01 PM

#	Model	Price	¢/img	Elo
1	Sora 2 Pro OpenAI	$$$$	30¢	1249
2	Seedance 2.0 ByteDance	$$$$	18.1¢	1117
3	Veo 3.1 Fast Google	$$$$	10¢	1099
4	P-Video PrunaAI	$$	2¢	1075
5	Grok Imagine Video xAI	$$$	5¢	1070

Think a model is ranked wrong? Cast your vote

Highlighted challenges

Text-to-Video Human Fidelity Temporal Consistency

The Rubik's Gauntlet

This prompt is one of the hardest single tests for 2026 SOTA video models because it simultaneously demands extreme fine-motor precision at high speed, long-term physical consistency (the cube must genuinely solve without morphing), and complex multi-element rendering (hyper-detailed skin, sweat, glossy reflections, and dynamic camera movement). Areas where even top models still frequently break down.

Text-to-Video Human Fidelity Temporal Consistency

The Soul Gauntlet

This is one of the hardest remaining frontiers in 2026 video generation; testing whether models can convey genuine human emotion through subtle facial acting, realistic tear physics, and micro-expressions. While many models can create beautiful faces, very few can deliver emotionally convincing performances without looking uncanny or robotic.

View all challenges

Keep the arena honest

Cast your vote

Pick winners in blind matchups. Every vote nudges the Elo and shapes these rankings.

Cast Your Vote

Suggest a prompt

Got an idea worth testing? Submit a prompt and watch the models battle it out.

Vote Now

Models · slot A

Temporal Consistency · Text-to-Video

Highlights

Best models, by

Best Text-to-Video Models by Price

Best Text-to-Video Models by Speed

Best AI Models for Temporal Consistency

Highlighted challenges

The Rubik's Gauntlet

The Soul Gauntlet

Cast your vote

Suggest a prompt