Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio-driven Talking Face Generation with Stabilized Synchronization Loss (2307.09368v3)

Published 18 Jul 2023 in cs.CV

Abstract: Talking face generation aims to create realistic videos with accurate lip synchronization and high visual quality, using given audio and reference video while preserving identity and visual characteristics. In this paper, we start by identifying several issues with existing synchronization learning methods. These involve unstable training, lip synchronization, and visual quality issues caused by lip-sync loss, SyncNet, and lip leaking from the identity reference. To address these issues, we first tackle the lip leaking problem by introducing a silent-lip generator, which changes the lips of the identity reference to alleviate leakage. We then introduce stabilized synchronization loss and AVSyncNet to overcome problems caused by lip-sync loss and SyncNet. Experiments show that our model outperforms state-of-the-art methods in both visual quality and lip synchronization. Comprehensive ablation studies further validate our individual contributions and their cohesive effects.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dogucan Yaman (13 papers)
  2. Fevziye Irem Eyiokur (12 papers)
  3. Leonard Bärmann (6 papers)
  4. Alexander Waibel (45 papers)
  5. Hazim Kemal Ekenel (15 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.