ARENA Leaderboard

See how AI image models stack up against each other. How it works

Which model makes the best edits?

Same source image, same instruction, blind community votes. See which models handle edits best.

Best AI Models for Image To Video

2 models ranked · Last update: June 19, 2026 2:16 PM

#	Model	Price	¢/img	Speed	Elo
1	Wan 2.6 Alibaba	$$	3¢		1042
2	Grok Imagine Video xAI	$$$	5¢		991

As of June 2026, Alibaba’s Wan 2.6 leads the Image-to-Video leaderboard with an Elo of 1042, maintaining a significant 52-point gap over xAI’s Grok Imagine Video at 990. While Wan 2.6 holds a superior 25.0% win rate, it operates 32% slower than its primary competitor. Despite the performance deficit, the market favors Wan 2.6 for its cost-efficiency, as it delivers top-tier results at a 40% lower price point per generation than Grok.

Elo vs Cost

Elo vs Speed

Think a model is ranked wrong? Cast your vote

Challenges

Celebrity Arrival Image-to-Video Cinematic

Animation prompt

“Ultra-realistic celebrity arrival scene in New York City, filmed as one continuous handheld shot from inside a dense crowd behind barricades. Documentary realism, natural micro-shake, no cuts, no artificial camera moves. Use the subject from the reference image as the main character. Keep face identity highly consistent across all frames. The outfit must match the reference image exactly. The subject should appear calm and controlled, with a subtle confident smile. Nighttime outside a fancy hotel. Lighting comes from street lights, hotel entrance lights, media flashes, phone screens, and reflections on polished cars and glass. Soft realistic shadows and slight atmospheric haze. Natural environment audio only: loud crowd cheering, overlapping voices, shouting fans, camera shutter clicks, phones recording, distant city noise, footsteps, fabric movement, and security activity. Scene flow: The camera begins inside the crowd, partially blocked by heads, raised phones, waving hands, and phone screens. The crowd is chaotic and excited. The camera lifts slightly above shoulder level. Focus shifts naturally between the crowd and the hotel entrance. The subject exits the hotel in the distance as camera flashes go off. Security pushes the crowd back, causing natural camera shake. Through gaps in the crowd, the subject becomes visible, first soft and partially obscured, then clearer as he walks forward with a small escort team. The subject walks up to a fan and signs a printed photo of himself from the reference image. The camera pushes in naturally, not digitally. The subject is now clearly visible near center frame. He walks confidently, raises one hand, and gives a calm wave with a slight smile while the camera struggles to keep him framed. A convoy of three large premium SUVs appears by the curb. Security opens the back door of the middle black Suburban. The subject enters quickly, rolls the window down, and waves to the crowd as the vehicles begin moving away. The crowd jumps, cheers, and records the moment on their phones.”

Input

Top 3

Kling V3 Omni Pro

Wan 2.6

Grok Imagine Video

3 models competed

Vote on this challenge All results & rankings

FAQ

What is the best AI image to video model?

Based on blind community voting, Wan 2.6 is currently the #1 ranked AI image to video model with an Elo rating of 1042. Rankings update in real time as new votes come in.

How are AI image to video models ranked on Lumenfall?

Lumenfall Arena ranks AI models through blind community voting. In each matchup, two models generate from the same prompt and voters pick the better result without seeing model names. Votes are processed using TrueSkill, a Bayesian rating algorithm developed by Microsoft Research, that produces a single Elo score reflecting each model's relative quality.

What is an Elo rating for AI models?

An Elo rating is a numerical score representing a model's skill relative to other models. Under the hood, Lumenfall uses TrueSkill, which tracks two values per model: mu (estimated skill) and sigma (uncertainty). The displayed Elo is calculated as 1000 + 10 x (mu - 3*sigma), a conservative lower bound. A model must prove itself consistently across many matchups to earn a high rating.

Cast Your Vote

Vote Now