Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection (2401.05746v1)

Published 11 Jan 2024 in cs.MM

Abstract: Audio-visual deepfake detection scrutinizes manipulations in public video using complementary multimodal cues. Current methods, which train on fused multimodal data for multimodal targets face challenges due to uncertainties and inconsistencies in learned representations caused by independent modality manipulations in deepfake videos. To address this, we propose cross-modality and within-modality regularization to preserve modality distinctions during multimodal representation learning. Our approach includes an audio-visual transformer module for modality correspondence and a cross-modality regularization module to align paired audio-visual signals, preserving modality distinctions. Simultaneously, a within-modality regularization module refines unimodal representations with modality-specific targets to retain modal-specific details. Experimental results on the public audio-visual dataset, FakeAVCeleb, demonstrate the effectiveness and competitiveness of our approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Heqing Zou (15 papers)
  2. Meng Shen (37 papers)
  3. Yuchen Hu (60 papers)
  4. Chen Chen (753 papers)
  5. Eng Siong Chng (112 papers)
  6. Deepu Rajan (14 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.