State of AI Image Generation, Q1 2026
Blind rankings from 3,500+ human votes across 29 models
There's no shortage of opinions about which AI image model is "best", but there's a severe shortage of data. We built the Lumenfall Arena to fix that. Since February 2026, human voters have judged 3,509 blind, head-to-head matchups between 29 AI image models from 12 organizations.
Street photography, fantasy warriors, logo design, floral mandalas, isometric dioramas. Same prompt, two images, hidden model names. Users picked the one they liked better. Here's what we found.
The rankings: Image generation
Rank |
Model |
Creator |
Elo |
Win Rate |
Battles |
|---|---|---|---|---|---|
1 |
Gemini 3.1 Flash Image Preview (Nano Banana 2) |
1299 |
79.7% |
74 |
|
2 |
GPT Image 1.5 |
OpenAI |
1283 |
69.9% |
216 |
3 |
Gemini 3 Pro Image Preview (Nano Banana Pro) |
1277 |
64.6% |
175 |
|
4 |
FLUX.2 [dev] Turbo |
fal |
1269 |
56.4% |
181 |
5 |
FLUX.2 [max] |
Black Forest Labs |
1266 |
58.7% |
167 |
6 |
FLUX.2 [dev] Flash |
fal |
1262 |
57.9% |
126 |
7 |
Seedream 4.5 |
ByteDance |
1261 |
58.9% |
158 |
8 |
ImagineArt 1.5 (Preview) |
Vyro AI |
1260 |
57.1% |
170 |
9 |
FLUX.2 [pro] |
Black Forest Labs |
1255 |
53.8% |
145 |
10 |
Z-Image Turbo |
Alibaba |
1252 |
46.5% |
198 |
Full rankings for all 29 models on https://lumenfall.ai/arena
Five things that stood out for us
- Google swept the top spots with their Nano Banana models. Gemini 3.1 Flash Image Preview (Nano Banana 2) is #1 with a 79.7% win rate. Its sibling, Gemini 3 Pro Image Preview (Nano Banana Pro), is #3. The Nano Banana family is clearly leading Google's image generation performance right now. One caveat: Nano Banana 2 has only 74 battles so far, since it's a relatively new model.
- The FLUX.2 variants are remarkably close and the cheap ones are winning. Black Forest Labs and fal's distilled versions together hold five of the top nine spots, but the spread between them is razor-thin: just 15 Elo points separate FLUX.2 [dev] Turbo at #4 from FLUX.2 [pro] at #9. The more interesting story is the ordering. fal's budget-friendly [dev] Turbo (0.8¢/image) and [dev] Flash (0.5¢/image) rank higher than the pricier FLUX.2 [max] (3¢) and FLUX.2 [pro] (1.5¢). The cheaper distilled variants aren't just keeping up with the flagship models — they're outperforming them. If you're evaluating FLUX.2, you may not need the premium tier.
- GPT Image 1.5 is quietly one of the strongest models in the arena. It sits at #2 overall with a 69.9% win rate across 216 matchups — the most battles of any model in the top three. Despite that, it gets a fraction of the attention that Google's Nano Banana models receive. The Nano Banana launch dominated social media and tech coverage; GPT Image 1.5 just kept winning matchups. It's only 16 Elo points behind the top-ranked Nano Banana 2, and on the cost side it starts at 0.9¢ per image on the low-quality setting — though higher quality tiers cost more. If you're picking a model based on data rather than hype, this one deserves a closer look.
- The biggest surprises aren't from the biggest companies. ByteDance's Seedream 4.5 sits at #7 with a 58.9% win rate across 158 battles, wedged between FLUX.2 variants in one of the most competitive parts of the table. Vyro AI's ImagineArt 1.5 holds #8 with 57.1% across 170 battles. Neither company dominates the Western AI image conversation, but both are outperforming brands with much more mindshare. ByteDance's newer Seedream 5.0 Lite has also entered the arena (currently ~#13 with fewer battles so far) and shows early promise, especially in editing.
- Image editing rankings look nothing like generation rankings. Our editing leaderboard (1,517 matchups across 16 models) reshuffles the deck:
Rank |
Model |
Creator |
Elo |
Battles |
|---|---|---|---|---|
1 |
Gemini 3 Pro Image Preview (Nano Banana Pro) |
1245 |
243 |
|
2 |
Qwen Image Edit 2511 |
Alibaba |
1230 |
546 |
3 |
GPT Image 1.5 |
OpenAI |
1230 |
190 |
4 |
FLUX.2 [flex] |
Black Forest Labs |
1227 |
102 |
5 |
Gemini 2.5 Flash Image (Nano Banana) |
1227 |
215 |
FLUX.2 [flex] jumps from 16th in generation to 4th in editing. Alibaba's Qwen Image Edit 2511 is the editing workhorse with a 56.2% win rate; notable given that the corresponding generation model, Qwen Image 2512, sits all the way down at #18 on the text-to-image leaderboard. Generation skill doesn't predict editing skill. Treat them as separate decisions.
Organization rankings
Which company has the strongest portfolio overall?
Organization |
Models |
Avg Elo |
Best Model |
|---|---|---|---|
fal |
2 |
1266 |
FLUX.2 [dev] Turbo |
OpenAI |
2 |
1265 |
GPT Image 1.5 |
Vyro AI |
1 |
1260 |
ImagineArt 1.5 |
Black Forest Labs |
4 |
1249 |
FLUX.2 [max] |
ByteDance |
3 |
1248 |
Seedream 4.5 |
xAI |
2 |
1236 |
Grok Imagine Image Pro |
6 |
1228 |
Nano Banana 2 |
How the arena works
We use TrueSkill, Microsoft's Bayesian rating system, which updates both a model's estimated skill and the system's confidence in that estimate after every matchup. We display the results as Elo scores (starting at 1000) because most people understand the scale from chess. Beating a higher-rated model earns more points than beating a lower-rated one, same as Elo, but TrueSkill converges faster and handles models with fewer battles more carefully.
Voters see two images generated from the same prompt. They don't know which model made which image. They pick the better one, or call it a tie.
A few numbers on the integrity of the data:
- Left-side images won 51.0% of decisive (non-tie) votes, close to the 50% you'd expect if position doesn't matter.
- Only 7.8% of matchups were ties, meaning voters could tell the difference most of the time.
- 20 competitions cover prompts ranging from "Adorable Baby Animals in Sunny Meadow" to "Apollo 11: Journey to Tranquility" to "Vintage Cafe Logo."
Rankings update in real-time on lumenfall.ai/leaderboard.
What to pick if you're building something
If you need the best output and cost isn't the constraint: Gemini 3.1 Flash Image Preview (Nano Banana 2) or GPT Image 1.5. Both are available through lumenfall.ai with a single integration.
If you're watching costs: the FLUX.2 Turbo and Flash variants through fal are cheaper and still land in the top 6. Alibaba's Z-Image Turbo is another strong budget pick: It sits at #10 with a price of just 0.5¢ per image, making it one of the best value-for-quality options on the board.
If you need image editing: look at Gemini 3 Pro Image Preview (Nano Banana Pro) and FLUX.2 [flex]. A model that generates well doesn't necessarily edit well. The rankings are different enough that you should treat these as separate decisions.
What's next
This is the first edition. We'll publish a Q2 update in July with more models and more votes. The arena is live and the rankings shift as new votes come in. 3,509 votes from 268 participants is a real start, not a definitive verdict. We're being transparent about sample sizes because we think that makes the data more useful, not less. As the vote count grows, so will our confidence in the tighter matchups.
If you want a say in the rankings: https://lumenfall.ai/arena/vote
If you want to run these models: lumenfall.ai