Head to head
Esc

Models · slot A

to navigate to pick
Arena / Challenges

Text-to-Image challenges

Every text-to-image challenge in the arena, scored with TrueSkill as the votes come in. Filter by skill to narrow it down.

Geometric Composition

“A glass cube on a wooden table. Inside the cube is a small blue sphere. On top of the cube sits a red book. A green plant is behind the cube, parti...”

Best
ImagineArt 1.5 (Preview)
Mid
Nano Banana Pro
Worst
DALL-E 2
Portrait

Fantasy Warrior

“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torch...”

Best
ImagineArt 1.5 (Preview)
Mid
Z-Image Turbo
Worst
DALL-E 3

Isometric Miniature Diorama Scenes

“Create a clear, 45° top-down isometric miniature 3D cartoon scene of Japan's signature dish: sushi, with soft refined textures, realistic PBR mater...”

Best
Seedream 4.5
Mid
FLUX.2 [flex]
Worst
DALL-E 3
Text Rendering

Modern Clean Menu

“Modern minimalist restaurant menu design, white background with colorful food photos in grid, sections for appetizers/pizza/mains, bold sans-serif ...”

Best
Grok Imagine Image
Mid
GPT Image 2
Worst
DALL-E 2
Text Rendering Product, Branding & Commercial

Vintage Cafe Logo

“Vintage minimalist restaurant logo for "Caffè Florian", retro cloche dome with steam and "Est. 1720" banner, classic typography, warm brown and cre...”

Best
GPT Image 1.5
Mid
Seedream 4.5
Worst
DALL-E 2

Adorable Baby Animals in Sunny Meadow

“Hyper-photorealistic scene of fluffy baby animals—a golden retriever puppy, tabby kitten, baby bunny, and red fox kit—with big expressive eyes and ...”

Best
Recraft V4 Pro
Mid
FLUX.2 [max]
Worst
DALL-E 3
Photorealism

Candid Street Photography

“A candid street photo of an elderly Japanese man repairing a red bicycle in light rain, reflections on wet pavement, shallow depth of field, 50mm l...”

Best
GPT Image 1 Mini
Mid
FLUX.2 [dev] Turbo
Worst
DALL-E 3
Text Rendering

Apollo 11: Journey to Tranquility

“Create a clean, modern vector infographic poster about the Apollo 11 mission. NASA-inspired palette (navy, white, muted red, light gray). Flat-vect...”

Best
Nano Banana Pro
Mid
Nano Banana
Worst
Wan 2.6
Text Rendering Photorealism

Magic Burger Explosion: Fiery Photorealism Challenge

This prompt forces models to simultaneously nail a highly specific, multi-layered commercial scene: dynamic exploded composition with multiple flying food elements, photorealistic textures, dramatic fiery lighting with embers, and precisely integrated glowing text, all while keeping strong visual impact. It is a perfect stress test that quickly separates models with true prompt mastery and creative control from those that miss details, break physics, or produce generic results.

Best
GPT Image 2
Mid
GPT Image 1.5
Worst
Stable Diffusion 3.5 Large Turbo
Photorealism

The Capybara Taxi Driver

This challenge seems to be difficult for models because it mixes reality with fiction. Most models struggle to keep the taxi realistic or loose instructions like placing the passenger not in the backseat.

Best
Seedream 5.0 Lite
Mid
Recraft V4
Worst
DALL-E 3
Text Rendering Photorealism

Chalkboard Menu

This challenge forces models to use one consistent handwritten style across an entire dense menu instead of defaulting to clean printed text for the smaller details, a very common failure that reveals how well they actually understand and maintain stylistic coherence.

Best
Grok Imagine Image
Mid
FLUX.1 [schnell] FP8
Worst
Wan 2.7
Art Photorealism

The Reversed Rodeo

This competition tests how well AI image models truly understand language versus how much they rely on visual habits from their training data. The prompt is deliberately simple on the surface but devilishly hard in practice. Most models default to the familiar trope of an astronaut riding a horse. By forcing the reversal, we measure three critical capabilities that separate good models from great ones: Strict instruction following (including negations) Accurate subject-object relationships and spatial hierarchy Resistance to strong dataset biases

Best
GPT Image 2
Mid
Grok Imagine Image Pro
Worst
Stable Diffusion 3.5 Medium