Linking generative quality to IFL performance

Establish whether and how generative quality measures—such as aesthetic quality (NIMA, GIQA), image–text matching, and preservation fidelity (PSNR, SSIM, LPIPS)—predict or explain the performance of image forgery localization models on TGIF2, including whether such measures account for performance discrepancies across generative models and subsets.

Background

The authors evaluated several generative quality metrics on inpainted regions and real-but-regenerated regions and analyzed their correlation with IFL performance for both original (spliced) and fine-tuned (fully regenerated) settings.

Their analyses did not reveal conclusive relationships, and they could not explain observed performance differences (e.g., between SD2 and FLUX.1 after fine-tuning) via these metrics, leaving the relationship between generative quality and localization performance unresolved.

References

However, we could not establish conclusive relations between generative quality and IFL performance.

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark  (2603.28613 - Mareen et al., 30 Mar 2026) in Section 5, Discussion and Conclusion