Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Show Me Your Face, And I'll Tell You How You Speak (2206.14009v1)

Published 28 Jun 2022 in cs.CV, cs.SD, eess.AS, and eess.IV

Abstract: When we speak, the prosody and content of the speech can be inferred from the movement of our lips. In this work, we explore the task of lip to speech synthesis, i.e., learning to generate speech given only the lip movements of a speaker where we focus on learning accurate lip to speech mappings for multiple speakers in unconstrained, large vocabulary settings. We capture the speaker's voice identity through their facial characteristics, i.e., age, gender, ethnicity and condition them along with the lip movements to generate speaker identity aware speech. To this end, we present a novel method "Lip2Speech", with key design choices to achieve accurate lip to speech synthesis in unconstrained scenarios. We also perform various experiments and extensive evaluation using quantitative, qualitative metrics and human evaluation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Christen Millerdurai (5 papers)
  2. Lotfy Abdel Khaliq (2 papers)
  3. Timon Ulrich (1 paper)

Summary

We haven't generated a summary for this paper yet.