Behavior of confidence scores under distribution shifts in multimedia forensics

Characterize the behavior of confidence scores produced by multimedia forensic detectors when the test distribution differs from the training distribution, to ascertain how these scores respond and potentially miscalibrate under distribution shifts.

Background

The paper advocates for AI forensic agents that orchestrate multiple detectors and explicitly quantify uncertainty. A key requirement is that these systems not only provide predictions but also calibrated confidence scores and the ability to abstain when evidence is insufficient.

The authors note that current multimedia forensics often treats uncertainty superficially. Despite detectors producing various scores (detection, similarity, anomaly), confidence calibration is uncommon, and, crucially, how these confidence scores behave under distribution shifts remains unknown. This gap impedes reliable fusion and decision-making in real-world settings where data distributions frequently change.

References

In current multimedia forensics practice, uncertainty is often treated superficially: confidence scores are rarely calibrated, and their behavior under distribution shifts is largely unknown.

Don't Guess, Escalate: Towards Explainable Uncertainty-Calibrated AI Forensic Agents  (2512.16614 - Boato et al., 18 Dec 2025) in Section: TRUSTWORTHY FORENSIC AGENTS