State of AI Image Generation, Q1 2026
Blind rankings from 3,500+ human votes across 29 models
There’s no shortage of opinions about which AI image model is “best.” There’s a severe shortage of data. We built the Lumenfall Arena to fix that. Since February 2026, human voters have judged 3,509 blind, head-to-head matchups between 29 AI image models from 12 organizations.
Street photography, fantasy warriors, logo design, floral mandalas, isometric dioramas. Same prompt, two images, no model names. Pick the one you like better. Here’s what we found.
The rankings: text-to-image
Rank |
Model |
Creator |
Elo |
Win Rate |
Battles |
|---|---|---|---|---|---|
1 |
Gemini 3.1 Flash Image Preview (Nano Banana 2) |
1299 |
79.7% |
74 |
|
2 |
GPT Image 1.5 |
OpenAI |
1283 |
69.9% |
216 |
3 |
Gemini 3 Pro Image Preview (Nano Banana Pro) |
1277 |
64.6% |
175 |
|
4 |
FLUX.2 [dev] Turbo |
fal |
1269 |
56.4% |
181 |
5 |
FLUX.2 [max] |
Black Forest Labs |
1266 |
58.7% |
167 |
6 |
FLUX.2 [dev] Flash |
fal |
1262 |
57.9% |
126 |
7 |
Seedream 4.5 |
ByteDance |
1261 |
58.9% |
158 |
8 |
ImagineArt 1.5 (Preview) |
Vyro AI |
1260 |
57.1% |
170 |
9 |
FLUX.2 [pro] |
Black Forest Labs |
1255 |
53.8% |
145 |
10 |
Z-Image Turbo |
Alibaba |
1252 |
46.5% |
198 |
Full rankings for all 29 models on lumenfall.ai/leaderboard.
Five things we didn’t expect
- Google swept the top spots with their Nano Banana models. Gemini 3.1 Flash Image Preview (Nano Banana 2) is #1 with a 79.7% win rate. Its sibling, Gemini 3 Pro Image Preview (Nano Banana Pro), is #3. The Nano Banana family is clearly leading Google’s image generation performance right now. One caveat: Nano Banana 2 has only 74 battles so far.
- FLUX.2 has the deepest bench in the game. Black Forest Labs and fal’s in-house distilled versions together hold five of the top nine spots. FLUX.2 [dev] Turbo at #4. FLUX.2 [max] at #5. FLUX.2 [dev] Flash at #6. FLUX.2 [pro] at #9. No single FLUX.2 variant takes the crown, but the family’s consistency across different speed and quality tiers is unmatched. If you need a model that won’t embarrass you on any particular prompt, this family is hard to beat.
- GPT Image 1.5 is the most battle-tested model near the top. 216 matchups is the most of any model in the top three. A 69.9% win rate at that volume isn’t luck. It sits at #2 overall, just 16 Elo points behind Gemini 3.1 Flash Image Preview but with nearly three times the data behind it. Whatever OpenAI is doing with image generation, it’s working.
- The biggest surprises aren’t from the biggest companies. ByteDance’s Seedream 4.5 sits at #7 with a 58.9% win rate across 158 battles, wedged between FLUX.2 variants in one of the most competitive parts of the table. Vyro AI’s ImagineArt 1.5 holds #8 with 57.1% across 170 battles. Neither company dominates the Western AI image conversation, but both are outperforming brands with much more mindshare. ByteDance’s newer Seedream 5.0 Lite has also entered the arena (currently ~#13 with fewer battles so far) and shows early promise, especially in editing.
- Image editing rankings look nothing like generation rankings. Our editing leaderboard (1,517 matchups across 16 models) reshuffles the deck:
Rank |
Model |
Creator |
Elo |
Battles |
|---|---|---|---|---|
1 |
Gemini 3 Pro Image Preview (Nano Banana Pro) |
1245 |
243 |
|
2 |
Qwen Image Edit 2511 |
Alibaba |
1230 |
546 |
3 |
GPT Image 1.5 |
OpenAI |
1230 |
190 |
4 |
FLUX.2 [flex] |
Black Forest Labs |
1227 |
102 |
5 |
Gemini 2.5 Flash Image (Nano Banana) |
1227 |
215 |
FLUX.2 [flex] jumps from 16th in generation to 4th in editing. Alibaba’s Qwen Image Edit 2511 has by far the most editing battles of any model (546, more than double the next closest) with a 56.2% win rate. It’s the editing workhorse. Generation skill doesn’t predict editing skill. Treat them as separate decisions.
Organization rankings
Which company has the strongest portfolio overall?
Organization |
Models |
Avg Elo |
Best Model |
|---|---|---|---|
fal |
2 |
1266 |
FLUX.2 [dev] Turbo |
OpenAI |
2 |
1265 |
GPT Image 1.5 |
Vyro AI |
1 |
1260 |
ImagineArt 1.5 |
Black Forest Labs |
4 |
1249 |
FLUX.2 [max] |
ByteDance |
3 |
1248 |
Seedream 4.5 |
xAI |
2 |
1236 |
Grok Imagine Image Pro |
How the arena works
We use TrueSkill, Microsoft’s Bayesian rating system, which updates both a model’s estimated skill and the system’s confidence in that estimate after every matchup. We display the results as Elo scores (starting at 1000) because most people understand the scale from chess. Beating a higher-rated model earns more points than beating a lower-rated one, same as Elo, but TrueSkill converges faster and handles models with fewer battles more carefully.
Voters see two images generated from the same prompt. They don’t know which model made which image. They pick the better one, or call it a tie.
A few numbers on the integrity of the data:
- Left-side images won 51.0% of decisive (non-tie) votes, close to the 50% you’d expect if position doesn’t matter.
- Only 7.8% of matchups were ties, meaning voters could tell the difference in most head-to-head comparisons.
- 20 competitions cover prompts ranging from “Adorable Baby Animals in Sunny Meadow” to “Apollo 11: Journey to Tranquility” to “Vintage Cafe Logo.”
Rankings update in real-time on lumenfall.ai/leaderboard.
What to pick if you’re building something
If you need the best output and cost isn’t the constraint: Gemini 3.1 Flash Image Preview (Nano Banana 2) or GPT Image 1.5. Both are available through lumenfall.ai with a single integration.
If you’re watching costs: the FLUX.2 Turbo and Flash variants through fal are cheaper and still land in the top 6.
If you need image editing: look at Gemini 3 Pro Image Preview (Nano Banana Pro) and FLUX.2 [flex]. A model that generates well doesn’t necessarily edit well. The rankings are different enough that you should treat these as separate decisions.
What’s next
This is the first edition. We’ll publish a Q2 update in July with more models and more votes. The arena is live and the rankings shift as new votes come in. 3,509 votes from 268 participants is a real start, not a definitive verdict. We’re being transparent about sample sizes because we think that makes the data more useful, not less. As the vote count grows, so will our confidence in the tighter matchups.
If you want a say in the rankings: lumenfall.ai/arena.
If you want to run these models: lumenfall.ai.