The Reversed Rodeo
VoteAll models were given the same prompt, and the community voted blind on which outputs looked best. How it works
This competition tests how well AI image models truly understand language versus how much they rely on visual habits from their training data. The prompt is deliberately simple on the surface but devilishly hard in practice. Most models default to the familiar trope of an astronaut riding a horse. By forcing the reversal, we measure three critical capabilities that separate good models from great ones:
- Strict instruction following (including negations)
- Accurate subject-object relationships and spatial hierarchy
- Resistance to strong dataset biases
Prompt
“Horse riding astronaut in space — horse on top, not vice versa. Surreal, highly detailed, cinematic.”
Voters were asked to judge by
Horse actually on the back of the astronaut