OpenAI's legacy image generation model supporting generations, edits with masks (inpainting), and variations
Settled by community votes across 1 shared challenge, with an AI judge weighing in on each.
DALL-E 2
#37 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
FLUX.1 [dev]
#42 of 44 in Text-to-Image
Where the votes landed
DALL-E 2
0%
win rate
Ties
0%
FLUX.1 [dev]
0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
The Halloween Invitation
Text-to-Image“Vintage gothic Halloween party invitation. Dark parchment poster, spooky border with webs and thorns, central glowing jack-o-lantern, bats, twisted trees, moody night sky. Add elegant gothic title text saying "Halloween Party Invitation", a small scroll banner saying "You are invited to a night of frights", and event details at the bottom: Date: 30.10.2026 Time: 7pm Location: The Arches, NYC Spooky but polished, cinematic lighting, square format.”
AI Judge Analysis
DALL-E 2
- + Captures a strong vintage hand-painted aesthetic.
- + Correctly includes the requested twisted trees and bats into the background design.
- − Text is largely illegible and contains various artifacts.
- − Missing the central glowing jack-o-lantern and specific event details.
- − Poor image clarity and resolution compared to modern standards.
FLUX.1 [dev]
- + High visual quality with sharp details and cinematic lighting.
- + Follows most prompt instructions including the jack-o-lantern, thorns, and bats.
- + Includes almost all the specific text requested with high legibility.
- − Minor typos in text such as 'Falloween Ranty', 'Tre', and 'Archzs'.
- − Repeats the time '7pm' twice in the event details.
Verdict: FLUX.1 [dev] is the clear winner as it successfully incorporates most elements of the prompt including the jack-o-lantern and readable event details. DALL-E 2 fails significantly on text legibility and misses several core visual components like the pumpkin, resulting in a cluttered and blurry composition.
Explore each model
Black Forest Labs' 12-billion parameter flow transformer for high-quality text-to-image generation, suitable for personal and commercial use with streaming support