The Capybara Taxi Driver
Vote19 models were given the same prompt, and the community voted blind on which outputs looked best. How it works
This challenge seems to be difficult for models because it mixes reality with fiction. Most models struggle to keep the taxi realistic or loose instructions like placing the passenger not in the backseat.
#1 — Seedream 5.0 Lite
Challenge Rankings
| # | Model | Elo |
|---|---|---|
| 1 | 1225 | |
| 2 | 1214 | |
| 3 | 1210 | |
| 4 | 1206 | |
| 5 | 1204 | |
| 6 | 1158 | |
| 7 | 1146 | |
| 8 | 1135 | |
| 9 | 1133 | |
| 10 | 1131 | |
| 11 | 1126 | |
| 12 | 1115 | |
| 13 | 1113 | |
| 14 | 1112 | |
| 15 | 1103 | |
| 16 | 1098 | |
| 17 | 1087 | |
| 18 | 991 | |
| 19 | 990 |
Seedream 5.0 Lite leads the photorealism challenge with a 100% win rate and 1225 Elo, maintaining an 11-point lead over the budget-friendly Z-Image Turbo. Despite the complexity of the scene composition, the $0.005 Z-Image Turbo outperforms premium models like Nano Banana Pro and Wan 2.7 Pro in both Elo and generation speed.
Elo vs Cost
Elo vs Speed
Competitors
19 models, ranked by EloHighlighted Battles
The most competitive head-to-head matchups, selected by closeness and vote count.