FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals (1901.02212v3)

Published 8 Jan 2019 in cs.CV

Abstract: The recent proliferation of fake portrait videos poses direct threats on society, law, and privacy. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first engage several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an "in the wild" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher on several datasets, resulting with 96%, 94.65%, 91.50%, and 91.07% accuracies, on Face Forensics, Face Forensics++, CelebDF, and on our new Deep Fakes Dataset respectively. We also analyze signals from various facial regions, under image distortions, with varying segment durations, from different generators, against unseen datasets, and under several dimensionality reduction techniques.

Authors (2)

Umur Aybars Ciftci (5 papers)
Ilke Demir (12 papers)

Citations (342)

View on Semantic Scholar

Summary

Detection of Synthetic Portrait Videos Using Biological Signals

The paper by Ciftci, Demir, and Yin addresses the challenging problem of detecting synthetic portrait videos, often referred to as "deep fakes." The authors propose a novel approach that leverages biological signals embedded in videos to discern between authentic and fake content, specifically targeting the inconsistencies introduced by generative models that fail to replicate human physiological processes. This research is timely given the increasing prevalence of highly realistic synthetic videos that pose significant societal challenges, including misinformation and privacy violations.

Key Findings and Methodology

The authors introduce "FakeCatcher," a system that systematically identifies synthetic content by analyzing biological signals such as photoplethysmography (PPG), which are hidden yet inherently present in portrait videos. The detection methodology is predicated on the observation that generative models currently lack the fidelity to capture the nuanced biological signals that occur naturally within video sequences.

Signal Analysis: The researchers exploit multiple forms of PPG—chrominance-based and green channel-based PPG—from different facial regions. This multi-signal approach enhances robustness against variations in video quality and lighting conditions. Initial analysis involves comparing statistical and frequency-domain features of these signals between real and synthetic video pairs, revealing significant discrepancies that contribute to the detection process.

Classifier Development: The authors employ a combination of signal transformations and feature engineering to construct a feature space that captures authenticity markers. A support vector machine (SVM) serves as the classifier in this feature space, achieving notable accuracy rates, such as 91.50% on the Celeb-DF dataset and 96% on Face Forensics++. The method does not hinge on the generative model used to synthesize the videos, ensuring generalizability.

CNN Architecture: For further refinement, the authors develop convolutional neural networks (CNNs) trained on "PPG maps"—transformations of biological signal data into spatial-temporal image-like representations. This approach boosts detection accuracy by capturing temporal consistency and spatial coherence in a manner more adaptable to variations in video content.

Results and Evaluation

A rigorous evaluation on diverse datasets, including the newly introduced Deep Fakes Dataset, supports the efficacy of the "FakeCatcher" system. By emphasizing the biological signal's spatial coherence and temporal consistency as key discriminators between real and fake content, the paper underscores the inadequacies of pure machine learning approaches that neglect these physiological markers.

Numerical Performance: The system achieves accuracies of 91.07% on the Deep Fakes Dataset and 96% on Face Forensics++, showcasing its robustness across datasets of varying compressions, resolutions, and generative models.
Cross-Dataset and Model Testing: The effectiveness of the approach is also validated through cross-dataset evaluations, demonstrating substantial robustness and adaptability across different styles of synthetic video content.

Theoretical and Practical Implications

The research not only provides a practical tool for video authenticity verification but also contributes theoretically by highlighting the potential of biological signals as a discriminator in synthetic media detection. The paper calls for future exploration in "BioGAN" models, which could integrate biological markers into the generative adversarial networks themselves, to potentially improve the realism of generated videos while safeguarding authenticity checks.

Future Directions

Future work may explore extensions of the proposed methodology to non-human content or refine the integration of biological signal fidelity in generative models. Improved face detection modules that better accommodate varying facial attributes and dynamics under adverse conditions could also augment the robustness of the framework.

In conclusion, this paper presents a compelling strategy for tackling the pervasive issue of synthetic portraits, emphasizing the interplay between generative challenges and the immutable characteristics of biological processes. The innovative mapping of these signals into a computational framework exemplifies a significant step forward in video forensics, with both immediate applications and long-term research potential.

Related Papers

Find Related Papers

Tweets

https://twitter.com/jatan_loya/status/1807141037368062246

YouTube

Show All Videos