Distilled version of HiDream AI's 17B parameter text-to-image model
Settled by community votes across 1 shared challenge, with an AI judge weighing in on each.
HiDream I1 Fast
#38 of 44 in Text-to-Image
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Stable Diffusion 3.5 Large
#25 of 44 in Text-to-Image
Where the votes landed
HiDream I1 Fast
22.2%
win rate
Ties
11.1%
Stable Diffusion 3.5 Large
66.7%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Fantasy Warrior
Text-to-Image“Close portrait of a battle-worn paladin in ornate engraved plate armor, hair braided with small beads, faint scars and dirt on the skin, warm torchlight reflecting off metal, shallow depth of field, bokeh sparks, lifelike eyes, highly detailed texture on leather straps and cloth underlayer.”
AI Judge Analysis
HiDream I1 Fast
- + Excellent depiction of the requested beads in the braids
- + Strong use of warm torchlight and bokeh sparks
- + Intricately detailed engraving on the plate armor
- − The facial scars look painted on rather than realistic skin texture
- − The lighting on the face is a bit oversaturated and flat
Stable Diffusion 3.5 Large
- + Extremely realistic skin texture with lifelike dirt and scars
- + Superior rendering of the cloth and chainmail underlayers
- + More natural and cinematic lighting
- − Missed the request for beads in the hair braids
- − The background characters distract slightly from the 'close portrait' feel
Verdict: HiDream I1 Fast followed the specific detail of the beads in the hair much better and captured the fantasy 'paladin' aesthetic with vibrant torchlight. However, Stable Diffusion 3.5 Large produced a significantly more lifelike image with superior textures on the skin, armor, and fabric, making it the more visually impressive result despite missing the beads.
Explore each model
Stability AI's 8.1-billion parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model featuring improved image quality, typography, complex prompt understanding, and resource-efficiency