State of AI Media Generation

Which AI Makes the Best Images? We Had Humans Vote 3,500 Times to Find Out

Till 6 mins read

State of AI Image Generation, Q1 2026
Blind rankings from 3,500+ human votes across 29 models

There’s no shortage of opinions about which AI image model is “best.” There’s a severe shortage of data. We built the Lumenfall Arena to fix that. Since February 2026, human voters have judged 3,509 blind, head-to-head matchups between 29 AI image models from 12 organizations.

Street photography, fantasy warriors, logo design, floral mandalas, isometric dioramas. Same prompt, two images, no model names. Pick the one you like better. Here’s what we found.

The rankings: text-to-image

Rank

Model

Creator

Elo

Win Rate

Battles

1

Gemini 3.1 Flash Image Preview (Nano Banana 2)

Google

1299

79.7%

74

2

GPT Image 1.5

OpenAI

1283

69.9%

216

3

Gemini 3 Pro Image Preview (Nano Banana Pro)

Google

1277

64.6%

175

4

FLUX.2 [dev] Turbo

fal

1269

56.4%

181

5

FLUX.2 [max]

Black Forest Labs

1266

58.7%

167

6

FLUX.2 [dev] Flash

fal

1262

57.9%

126

7

Seedream 4.5

ByteDance

1261

58.9%

158

8

ImagineArt 1.5 (Preview)

Vyro AI

1260

57.1%

170

9

FLUX.2 [pro]

Black Forest Labs

1255

53.8%

145

10

Z-Image Turbo

Alibaba

1252

46.5%

198

Full rankings for all 29 models on lumenfall.ai/leaderboard.

Five things we didn’t expect

  1. Google swept the top spots with their Nano Banana models. Gemini 3.1 Flash Image Preview (Nano Banana 2) is #1 with a 79.7% win rate. Its sibling, Gemini 3 Pro Image Preview (Nano Banana Pro), is #3. The Nano Banana family is clearly leading Google’s image generation performance right now. One caveat: Nano Banana 2 has only 74 battles so far.
  2. FLUX.2 has the deepest bench in the game. Black Forest Labs and fal’s in-house distilled versions together hold five of the top nine spots. FLUX.2 [dev] Turbo at #4. FLUX.2 [max] at #5. FLUX.2 [dev] Flash at #6. FLUX.2 [pro] at #9. No single FLUX.2 variant takes the crown, but the family’s consistency across different speed and quality tiers is unmatched. If you need a model that won’t embarrass you on any particular prompt, this family is hard to beat.
  3. GPT Image 1.5 is the most battle-tested model near the top. 216 matchups is the most of any model in the top three. A 69.9% win rate at that volume isn’t luck. It sits at #2 overall, just 16 Elo points behind Gemini 3.1 Flash Image Preview but with nearly three times the data behind it. Whatever OpenAI is doing with image generation, it’s working.
  4. The biggest surprises aren’t from the biggest companies. ByteDance’s Seedream 4.5 sits at #7 with a 58.9% win rate across 158 battles, wedged between FLUX.2 variants in one of the most competitive parts of the table. Vyro AI’s ImagineArt 1.5 holds #8 with 57.1% across 170 battles. Neither company dominates the Western AI image conversation, but both are outperforming brands with much more mindshare. ByteDance’s newer Seedream 5.0 Lite has also entered the arena (currently ~#13 with fewer battles so far) and shows early promise, especially in editing.
  5. Image editing rankings look nothing like generation rankings. Our editing leaderboard (1,517 matchups across 16 models) reshuffles the deck:

Rank

Model

Creator

Elo

Battles

1

Gemini 3 Pro Image Preview (Nano Banana Pro)

Google

1245

243

2

Qwen Image Edit 2511

Alibaba

1230

546

3

GPT Image 1.5

OpenAI

1230

190

4

FLUX.2 [flex]

Black Forest Labs

1227

102

5

Gemini 2.5 Flash Image (Nano Banana)

Google

1227

215

FLUX.2 [flex] jumps from 16th in generation to 4th in editing. Alibaba’s Qwen Image Edit 2511 has by far the most editing battles of any model (546, more than double the next closest) with a 56.2% win rate. It’s the editing workhorse. Generation skill doesn’t predict editing skill. Treat them as separate decisions.

Organization rankings

Which company has the strongest portfolio overall?

Organization

Models

Avg Elo

Best Model

fal

2

1266

FLUX.2 [dev] Turbo

OpenAI

2

1265

GPT Image 1.5

Vyro AI

1

1260

ImagineArt 1.5

Black Forest Labs

4

1249

FLUX.2 [max]

ByteDance

3

1248

Seedream 4.5

xAI

2

1236

Grok Imagine Image Pro

How the arena works

We use TrueSkill, Microsoft’s Bayesian rating system, which updates both a model’s estimated skill and the system’s confidence in that estimate after every matchup. We display the results as Elo scores (starting at 1000) because most people understand the scale from chess. Beating a higher-rated model earns more points than beating a lower-rated one, same as Elo, but TrueSkill converges faster and handles models with fewer battles more carefully.

Voters see two images generated from the same prompt. They don’t know which model made which image. They pick the better one, or call it a tie.

A few numbers on the integrity of the data:

  • Left-side images won 51.0% of decisive (non-tie) votes, close to the 50% you’d expect if position doesn’t matter.
  • Only 7.8% of matchups were ties, meaning voters could tell the difference in most head-to-head comparisons.
  • 20 competitions cover prompts ranging from “Adorable Baby Animals in Sunny Meadow” to “Apollo 11: Journey to Tranquility” to “Vintage Cafe Logo.”

Rankings update in real-time on lumenfall.ai/leaderboard.

What to pick if you’re building something

If you need the best output and cost isn’t the constraint: Gemini 3.1 Flash Image Preview (Nano Banana 2) or GPT Image 1.5. Both are available through lumenfall.ai with a single integration.

If you’re watching costs: the FLUX.2 Turbo and Flash variants through fal are cheaper and still land in the top 6.

If you need image editing: look at Gemini 3 Pro Image Preview (Nano Banana Pro) and FLUX.2 [flex]. A model that generates well doesn’t necessarily edit well. The rankings are different enough that you should treat these as separate decisions.

What’s next

This is the first edition. We’ll publish a Q2 update in July with more models and more votes. The arena is live and the rankings shift as new votes come in. 3,509 votes from 268 participants is a real start, not a definitive verdict. We’re being transparent about sample sizes because we think that makes the data more useful, not less. As the vote count grows, so will our confidence in the tighter matchups.

If you want a say in the rankings: lumenfall.ai/arena.
If you want to run these models: lumenfall.ai.