Arena / Challenges

Text-to-Video challenges

Every text-to-video challenge in the arena, scored with TrueSkill as the votes come in.

Text-to-Image Image Editing Text-to-Video Image Upscaling Image-to-Video Text-to-Vector

The Rubik's Gauntlet

This prompt is one of the hardest single tests for 2026 SOTA video models because it simultaneously demands extreme fine-motor precision at high speed, long-term physical consistency (the cube must genuinely solve without morphing), and complex multi-element rendering (hyper-detailed skin, sweat, glossy reflections, and dynamic camera movement). Areas where even top models still frequently break down.

The Soul Gauntlet

This is one of the hardest remaining frontiers in 2026 video generation; testing whether models can convey genuine human emotion through subtle facial acting, realistic tear physics, and micro-expressions. While many models can create beautiful faces, very few can deliver emotionally convincing performances without looking uncanny or robotic.

Neon Rain Reverie

This prompt is exceptionally difficult because it combines complex fluid dynamics (rain, splashing, clinging wet fabric), advanced material simulation (flowing silk + hair in wind), and atmospheric lighting; three areas where even top 2026 models still frequently produce artifacts or unrealistic behavior.

The Will Smith Spaghetti Challenge

“Will Smith eating spaghetti” has become the unofficial benchmark of generative video for one simple reason: it is deceptively simple yet brutally revealing. This single prompt exposes weaknesses that flashy action scenes or beautiful landscapes often hide. A model can generate stunning visuals yet still fail spectacularly when asked to make a human being convincingly eat.