Chalkboard Menu
Vote9 models were given the same prompt, and the community voted blind on which outputs looked best. How it works
This challenge forces models to use one consistent handwritten style across an entire dense menu instead of defaulting to clean printed text for the smaller details, a very common failure that reveals how well they actually understand and maintain stylistic coherence.
#1 — DALL-E 3
Challenge Rankings
| # | Model | Elo |
|---|---|---|
| 1 | 1179 | |
| 2 | 1170 | |
| 3 | 1138 | |
| 4 | 1127 | |
| 5 | 1115 | |
| 6 | 1063 | |
| 7 | 1055 | |
| 8 | 1000 | |
| 9 | 994 |
GPT Image 1 Mini leads the challenge with an 1170 Elo, maintaining stylistic coherence 15x cheaper than the second-place Qwen Image 2.0 Pro (1138 Elo). Despite its low 1063 Elo, Recraft V4 Pro carries the highest price point at $0.250 per image, while the free FLUX.1 [schnell] FP8 remains competitive within 55 points of the top spot.
Elo vs Cost
Elo vs Speed
Competitors
9 models, ranked by EloHighlighted Battles
The most competitive head-to-head matchups, selected by closeness and vote count.