Frame-level anomaly maps for segment-level localization in partial audio deepfake detection
Develop frame-level anomaly mapping methods that operate on frozen speech foundation model embeddings to achieve segment-level localization of manipulated regions in partially spoofed audio, directly addressing short-spoof-segment weaknesses observed on the HAD and ADD 2023 benchmarks.
References
Several directions remain open: frame-level anomaly maps could enable segment-level localization, directly addressing the short-spoof-segment weakness on HAD and ADD 2023; multi-layer fusion across layers 15--21 may improve robustness beyond the single optimal layer; and the same paradigm could extend beyond audio to deepfake face detection via vision transformers, machine-generated text detection via LLMs, or cross-modal consistency verification in multimodal foundation models.