Two-branch Recurrent Network for Isolating Deepfakes in Videos (2008.03412v3)

Published 8 Aug 2020 in cs.CV, cs.CY, and cs.LG

Abstract: The current spike of hyper-realistic faces artificially generated using deepfakes calls for media forensics solutions that are tailored to video streams and work reliably with a low false alarm rate at the video level. We present a method for deepfake detection based on a two-branch network structure that isolates digitally manipulated faces by learning to amplify artifacts while suppressing the high-level face content. Unlike current methods that extract spatial frequencies as a preprocessing step, we propose a two-branch structure: one branch propagates the original information, while the other branch suppresses the face content yet amplifies multi-band frequencies using a Laplacian of Gaussian (LoG) as a bottleneck layer. To better isolate manipulated faces, we derive a novel cost function that, unlike regular classification, compresses the variability of natural faces and pushes away the unrealistic facial samples in the feature space. Our two novel components show promising results on the FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks, when compared to prior work. We then offer a full, detailed ablation study of our network architecture and cost function. Finally, although the bar is still high to get very remarkable figures at a very low false alarm rate, our study shows that we can achieve good video-level performance when cross-testing in terms of video-level AUC.

View on arXiv

Authors (5)

Iacopo Masi (28 papers)
Aditya Killekar (2 papers)
Royston Marian Mascarenhas (1 paper)
Shenoy Pratik Gurudatt (1 paper)
Wael AbdAlmageed (40 papers)

Citations (322)

View on Semantic Scholar

Summary

Two-branch Recurrent Network for Isolating Deepfakes in Videos

The prevalence of hyper-realistic deepfake videos poses a significant challenge for media forensics, necessitating robust detection methods that maintain low false alarm rates. This paper introduces a novel approach for deepfake detection in videos using a two-branch recurrent network that effectively isolates digitally manipulated content.

Methodology Overview

The proposed approach leverages a two-branch network architecture designed to emphasize artifacts characteristic of deepfakes while suppressing high-level facial content. Contrasting with traditional approaches that use spatial frequency extraction as a preprocessing step, this method employs a dual-path strategy: one branch handles the original input, and the second, using a Laplacian of Gaussian (LoG) layer, amplifies multi-band frequencies to expose inconsistencies typical of deepfakes.

To enhance the detection efficacy, the researchers introduce an innovative cost function predicated on compressing the variability of genuine facial features while pushing deepfake samples away. This non-binary classification technique diverges from standard loss functions, promoting greater discrimination between real and manipulated facial sequences.

Experimental Evaluation

The performance of the two-branch recurrent network was extensively evaluated against established datasets, such as FaceForensics++ (FF\Plus\Plus), Celeb-DF, and Facebook's Deepfake Detection Challenge (DFDC) preview. The results indicate a noteworthy reduction in false alarm rates at both frame and video levels. The network demonstrates impressive performance in cross-dataset scenarios, showcasing improved generalization capabilities.

Moreover, the paper deployed metrics suitable for realistic scenarios, such as True Acceptance Rate (TAR) at low False Acceptance Rates (FAR), standardized partial AUC, and truncated AUC (tAUC). The findings reveal superior performance on FF\Plus\Plus and Celeb-DF datasets, markedly improving the accuracy and low false alarm metrics compared to previous methods.

Implications and Future Directions

The approach delineated in this paper contributes significantly to advancing deepfake detection by refining the representation of facial embeddings to isolate manipulations. The implications are substantial for applications requiring high accuracy, such as video verification systems, digital forensics, and media integrity verification.

Looking ahead, further work could explore the integration of explainability mechanisms within the detection framework, potentially increasing user trust and insight into model decision processes. Enhancement strategies such as advanced data augmentation or leveraging extensive natural face datasets could also hone the model's robustness and adaptability to new deepfake technologies.

This paper’s novel application of a two-branch network structure and sophisticated loss function for deepfake detection represents a formidable advance in the AI and media forensics disciplines, setting a new bar for performance in the detection of sophisticated, state-of-the-art fake content.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos