Explaining higher FR detectability for FLUX.1 versus Stable Diffusion after fine-tuning

Determine the underlying factors that make fully regenerated images created by FLUX.1 models easier to detect and localize than fully regenerated images created by Stable Diffusion (SD) after fine-tuning image forgery localization models on TGIF2, and characterize the properties responsible for this discrepancy.

Background

After fine-tuning high-performing IFL models (TruFor and MMFusion) on fully regenerated subsets, the authors observe substantially better localization performance on FLUX.1-generated FR images than on those generated by SD (SD2), a counterintuitive result given that FLUX.1 generally produces higher-quality images.

They hypothesized that generative quality might explain the discrepancy, but subsequent analysis of aesthetic, image–text matching, and preservation fidelity metrics found no clear relationship, leaving the cause of the improved detectability for FLUX.1 FR images unresolved.

References

It remains to be investigated, in future work, why FLUX.1 fully regenerated images are easier to detect than SD, after fine-tuning.

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark  (2603.28613 - Mareen et al., 30 Mar 2026) in Section 4.3, Fine-tuning IFL Approaches on Fully Regenerated Images