Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection (2001.03024v2)

Published 9 Jan 2020 in cs.CV and cs.LG

Abstract: We present our on-going effort of constructing a large-scale benchmark for face forgery detection. The first version of this benchmark, DeeperForensics-1.0, represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings.

An Overview of DeeperForensics-1.0 Dataset for Face Forgery Detection

The paper "DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection" introduces a comprehensive dataset configured to improve the detection of manipulated facial videos. The DeeperForensics-1.0 dataset is highlighted for its extensive scale, containing 60,000 videos and totaling over 17.6 million frames. This dataset is ten times larger than those previously available, offering improved resources for training models tasked with identifying deepfakes.

Construction and Characteristics of DeeperForensics-1.0

A significant feature of DeeperForensics-1.0 is its attention to realism and diversity. The dataset includes varied real-world perturbations, such as compression artifacts and transmission errors, which challenge detection models by closely mimicking realistic video conditions. The dataset also involves a hidden test set containing highly deceptive videos, evaluated through human studies, further enhancing its utility for realistic scenario modeling.

To generate fake videos, the authors introduce a new end-to-end face swapping framework—DeepFake Variational Auto-Encoder (DF-VAE)—which significantly enhances the quality of generated content by focusing on style matching and temporal consistency. The data collection for the source videos involved controlled environments using informed consent from 100 actors, ensuring the ethical use of facial data.

Performance Evaluation

The paper conducts a comprehensive evaluation of five baseline video forgery detection methods, indicating the strengths and weaknesses of these approaches when applied to DeeperForensics-1.0. The results show that while these methods achieve high accuracy on the provided standard test set, the introduction of diverse perturbations and a hidden test set poses substantial challenges, reducing their effectiveness. It suggests the baselines need further refinement to enhance robustness against real-world deepfakes with unpredictable variations and sources.

Implications and Future Directions

The implications of this dataset are profound in the context of digital security, media verification, and AI ethics. DeeperForensics-1.0 not only provides a potent tool for advancing face forgery detection but also sets a precedent for future datasets in terms of ethical data collection and realistic simulation of manipulation techniques.

Future research directions include expanding the dataset, improving detection methodologies to better handle the hidden test set, and exploring more sophisticated generative models for detecting and resisting increasingly realistic forgeries. The meticulous construction and benchmarking approach presented in this paper offer a robust foundation for tackling the challenges posed by deepfake technology.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Liming Jiang (29 papers)
  2. Ren Li (19 papers)
  3. Wayne Wu (60 papers)
  4. Chen Qian (226 papers)
  5. Chen Change Loy (288 papers)
Citations (400)
Youtube Logo Streamline Icon: https://streamlinehq.com