Gemini 3.1 Flash with image generation capabilities. High-efficiency image generation model with support for text rendering, reference images, search grounding, and thinking mode. The efficient counterpart to Gemini 3 Pro Image.
These two have not faced off in a shared challenge yet. Here is how their skills stack up, side by side.
Nano Banana 2
28.5
arena score
#1 of 48 in Text-to-Image
Best Text-to-Image right now
Top 2 in Image Editing
Skill signature
· Text-to-Image
Grok Imagine Video
13.7
arena score
#6 of 7 in Text-to-Video
Top 3 in Image-to-Video
Not yet settled
Nano Banana 2 and Grok Imagine Video have not faced off in a shared challenge yet.
The skill signature above is the honest read for now. Cast a vote in the arena to start putting them head to head.
Next steps
Explore each model
xAI's video generation model based on the Aurora architecture, supporting text-to-video, image-to-video, and video editing with native audio-visual synthesis at up to 720p