Black Forest Labs' distilled 9 billion parameter image generation model with sub-second inference and multi-reference support
Settled by community votes across 3 shared challenges, with an AI judge weighing in on each.
FLUX.2 [klein] 9B
#7 of 23 in Image Editing
Not enough comparable category data
The chart appears once both models have ratings across at least three shared arena categories.
Grok Imagine Image Pro
#14 of 44 in Text-to-Image
Where the votes landed
FLUX.2 [klein] 9B
100.0%
win rate
Ties
0.0%
Grok Imagine Image Pro
0.0%
win rate
Challenge by challenge
The strongest take from each model on every shared challenge, with the AI judge's read.
Chalkboard Menu
Text-to-Image“Handwritten-style chalkboard menu in a cozy café, all text rendered in the exact same realistic chalk handwriting style with natural variations in letter size, slight slant, and chalk texture — no printed or digital fonts anywhere on the board. Title at the top in elegant cursive chalk handwriting: ‘TODAY’S SPECIALS – APRIL 30, 2026’. Below it, three menu items also in the same handwritten chalk style: ‘Truffle Mushroom Risotto – $24’, ‘Grilled Octopus with Lemon & Herbs – $28’, ‘Brown Butter Chocolate Chip Cookies – $9’. At the very bottom, smaller text in the identical handwritten chalk style (slightly smaller but still clearly legible with the same handwriting characteristics): ‘All items made fresh daily • Ask about our gluten-free options’. Warm ambient café lighting, visible chalk dust and smudges, realistic handwriting imperfections, no clean printed text anywhere.”
AI Judge Analysis
FLUX.2 [klein] 9B
- + Excellent chalk texture with realistic smudges and erase marks
- + Strong adherence to the 'elegant cursive' request for the title
- − Spelling error in the footer text ('fress' instead of 'fresh')
- − Slightly less legible than Model B due to the heavy chalk smudging
Grok Imagine Image Pro
- + Perfect spelling and high legibility across all text
- + Very clean composition that feels balanced within the frame
- + Excellent interpretation of the handwritten prompt while remaining clear
- − Title is in a blocky print style rather than the requested 'elegant cursive'
- − Chalk texture is a bit more uniform and less 'messy' than a real used chalkboard
Verdict: Both models followed the prompt's long-form text requirements almost perfectly. FLUX.2 [klein] 9B followed the stylistic instruction for 'elegant cursive' and had better chalk texture, but failed on basic spelling in the footer; Grok Imagine Image Pro had perfect spelling and better clarity, making it the more functional menu despite missing the cursive title requirement.
Pose & Character Mashup
Editing“Use Image 1 as the exact pose reference and Image 2 as the character reference. Recreate the person/character from Image 2 in the exact dynamic pose and body position from Image 1. Keep the exact face, hair, clothing style/details, and expression from Image 2. Match the lighting and environment of Image 1. The final image must show the character from Image 2 performing the precise action/pose from Image 1 with perfect anatomy and natural integration.”
AI Judge Analysis
FLUX.2 [klein] 9B
- + Successfully replicates the complex pose and red ottoman from Image 1.
- + High character fidelity including sunglasses, scarf, and black clothing from Image 2.
- + Matches the yellow studio lighting and background effectively.
- − One hand is clenched into a fist instead of the delicate finger pose in the source.
Grok Imagine Image Pro
- + Near-perfect preservation of the background and ottoman from Image 1.
- + Excellent anatomical placement of the character in the difficult pose.
- − Completely failed to use the character from Image 2, featuring a woman instead.
- − Dress code does not match Image 2, retaining the red hoodie from Image 1 instead.
Verdict: FLUX.2 [klein] 9B is the clear winner as it successfully synthesized the character reference into the pose reference, accurately capturing the man's face, sunglasses, and scarf. Grok Imagine Image Pro failed the primary instruction of character replacement, instead producing a slightly altered version of the original person in Image 1.
Outfit Transfer Challenge
Editing“Use Image 1 as the base person. Dress them in the exact elaborate outfit from Image 2 (including all layers, accessories, jewelry, and shoes). Carefully adapt the clothing to the body shape and pose in Image 1 while maintaining realistic fabric behavior, correct proportions, and perfect lighting/shadow matching. Keep the person’s exact face, hair, and background completely unchanged.”
AI Judge Analysis
FLUX.2 [klein] 9B
- + Perfectly replicates the specific clothing items from Image 2 (peacoat, plaid scarf, jeans).
- + Maintains the exact facial identity and unique skin patterns of the person from Image 1.
- + Seamlessly integrates the scarf and coat into the existing lighting and pose.
- − Adds several gold necklaces and jewelry that were not present in the clothes of Image 2.
- − Slightly Alters the expression and head angle compared to the source person.
Grok Imagine Image Pro
- + Matches the high-quality resolution and lighting of the original background.
- + Maintains the facial characteristics and hair pattern from Image 1 reasonably well.
- − Completely fails to use the outfit from Image 2, selecting a different 'elaborate' costume instead.
- − Changes the skin color of the hands to white, failing to maintain the subject's vitiligo or natural skin tone.
- − Alters the lens perspective, making the person appear much shorter than the original.
Verdict: FLUX.2 [klein] 9B followed the instruction much better, correctly identifying and transferring the specific peacoat, plaid scarf, and jeans from Image 2. While it added extra jewelry not found in the source, it successfully preserved the subject's unique identity. Grok Imagine Image Pro failed significantly by ignoring the specific clothing components from Image 2 and introducing a major anatomical error by giving the subject white hands.
Explore each model
xAI's premium image generation model offering higher fidelity output and stronger performance on single-image editing benchmarks compared to the standard Grok Imagine model