Which AI Makes the Best Images? We Had Humans Vote 3,500 Times to Find Out

State of AI Image Generation, Q1 2026
Blind rankings from 3,500+ human votes across 29 models

There's no shortage of opinions about which AI image model is "best", but there's a severe shortage of data. We built the Lumenfall Arena to fix that. Since February 2026, human voters have judged 3,509 blind, head-to-head matchups between 29 AI image models from 12 organizations.

Street photography, fantasy warriors, logo design, floral mandalas, isometric dioramas. Same prompt, two images, hidden model names. Users picked the one they liked better. Here's what we found.

The rankings: Image generation

Rank	Model	Creator	Elo	Win Rate	Battles
1	Gemini 3.1 Flash Image Preview (Nano Banana 2)	Google	1299	79.7%	74
2	GPT Image 1.5	OpenAI	1283	69.9%	216
3	Gemini 3 Pro Image Preview (Nano Banana Pro)	Google	1277	64.6%	175
4	FLUX.2 [dev] Turbo	fal	1269	56.4%	181
5	FLUX.2 [max]	Black Forest Labs	1266	58.7%	167
6	FLUX.2 [dev] Flash	fal	1262	57.9%	126
7	Seedream 4.5	ByteDance	1261	58.9%	158
8	ImagineArt 1.5 (Preview)	Vyro AI	1260	57.1%	170
9	FLUX.2 [pro]	Black Forest Labs	1255	53.8%	145
10	Z-Image Turbo	Alibaba	1252	46.5%	198

Full rankings for all 29 models on https://lumenfall.ai/arena

Five things that stood out for us

Google swept the top spots with their Nano Banana models. Gemini 3.1 Flash Image Preview (Nano Banana 2) is #1 with a 79.7% win rate. Its sibling, Gemini 3 Pro Image Preview (Nano Banana Pro), is #3. The Nano Banana family is clearly leading Google's image generation performance right now. One caveat: Nano Banana 2 has only 74 battles so far, since it's a relatively new model.
The FLUX.2 variants are remarkably close and the cheap ones are winning. Black Forest Labs and fal's distilled versions together hold five of the top nine spots, but the spread between them is razor-thin: just 15 Elo points separate FLUX.2 [dev] Turbo at #4 from FLUX.2 [pro] at #9. The more interesting story is the ordering. fal's budget-friendly [dev] Turbo (0.8¢/image) and [dev] Flash (0.5¢/image) rank higher than the pricier FLUX.2 [max] (3¢) and FLUX.2 [pro] (1.5¢). The cheaper distilled variants aren't just keeping up with the flagship models — they're outperforming them. If you're evaluating FLUX.2, you may not need the premium tier.
GPT Image 1.5 is quietly one of the strongest models in the arena. It sits at #2 overall with a 69.9% win rate across 216 matchups — the most battles of any model in the top three. Despite that, it gets a fraction of the attention that Google's Nano Banana models receive. The Nano Banana launch dominated social media and tech coverage; GPT Image 1.5 just kept winning matchups. It's only 16 Elo points behind the top-ranked Nano Banana 2, and on the cost side it starts at 0.9¢ per image on the low-quality setting — though higher quality tiers cost more. If you're picking a model based on data rather than hype, this one deserves a closer look.
The biggest surprises aren't from the biggest companies. ByteDance's Seedream 4.5 sits at #7 with a 58.9% win rate across 158 battles, wedged between FLUX.2 variants in one of the most competitive parts of the table. Vyro AI's ImagineArt 1.5 holds #8 with 57.1% across 170 battles. Neither company dominates the Western AI image conversation, but both are outperforming brands with much more mindshare. ByteDance's newer Seedream 5.0 Lite has also entered the arena (currently ~#13 with fewer battles so far) and shows early promise, especially in editing.
Image editing rankings look nothing like generation rankings. Our editing leaderboard (1,517 matchups across 16 models) reshuffles the deck:

Rank	Model	Creator	Elo	Battles
1	Gemini 3 Pro Image Preview (Nano Banana Pro)	Google	1245	243
2	Qwen Image Edit 2511	Alibaba	1230	546
3	GPT Image 1.5	OpenAI	1230	190
4	FLUX.2 [flex]	Black Forest Labs	1227	102
5	Gemini 2.5 Flash Image (Nano Banana)	Google	1227	215

FLUX.2 [flex] jumps from 16th in generation to 4th in editing. Alibaba's Qwen Image Edit 2511 is the editing workhorse with a 56.2% win rate; notable given that the corresponding generation model, Qwen Image 2512, sits all the way down at #18 on the text-to-image leaderboard. Generation skill doesn't predict editing skill. Treat them as separate decisions.

Organization rankings

Which company has the strongest portfolio overall?

Organization	Models	Avg Elo	Best Model
fal	2	1266	FLUX.2 [dev] Turbo
OpenAI	2	1265	GPT Image 1.5
Vyro AI	1	1260	ImagineArt 1.5
Black Forest Labs	4	1249	FLUX.2 [max]
ByteDance	3	1248	Seedream 4.5
xAI	2	1236	Grok Imagine Image Pro
Google	6	1228	Nano Banana 2

How the arena works

We use TrueSkill, Microsoft's Bayesian rating system, which updates both a model's estimated skill and the system's confidence in that estimate after every matchup. We display the results as Elo scores (starting at 1000) because most people understand the scale from chess. Beating a higher-rated model earns more points than beating a lower-rated one, same as Elo, but TrueSkill converges faster and handles models with fewer battles more carefully.

Voters see two images generated from the same prompt. They don't know which model made which image. They pick the better one, or call it a tie.

A few numbers on the integrity of the data:

Left-side images won 51.0% of decisive (non-tie) votes, close to the 50% you'd expect if position doesn't matter.
Only 7.8% of matchups were ties, meaning voters could tell the difference most of the time.
20 competitions cover prompts ranging from "Adorable Baby Animals in Sunny Meadow" to "Apollo 11: Journey to Tranquility" to "Vintage Cafe Logo."

Rankings update in real-time on lumenfall.ai/leaderboard.

What to pick if you're building something

If you need the best output and cost isn't the constraint: Gemini 3.1 Flash Image Preview (Nano Banana 2) or GPT Image 1.5. Both are available through lumenfall.ai with a single integration.

If you're watching costs: the FLUX.2 Turbo and Flash variants through fal are cheaper and still land in the top 6. Alibaba's Z-Image Turbo is another strong budget pick: It sits at #10 with a price of just 0.5¢ per image, making it one of the best value-for-quality options on the board.

If you need image editing: look at Gemini 3 Pro Image Preview (Nano Banana Pro) and FLUX.2 [flex]. A model that generates well doesn't necessarily edit well. The rankings are different enough that you should treat these as separate decisions.

What's next

This is the first edition. We'll publish a Q2 update in July with more models and more votes. The arena is live and the rankings shift as new votes come in. 3,509 votes from 268 participants is a real start, not a definitive verdict. We're being transparent about sample sizes because we think that makes the data more useful, not less. As the vote count grows, so will our confidence in the tighter matchups.

If you want a say in the rankings: https://lumenfall.ai/arena/vote
If you want to run these models: lumenfall.ai