Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline (2307.09821v1)

Published 19 Jul 2023 in cs.CV and cs.MM

Abstract: In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together. The listener Head generation task aims to synthesize responsive listener's head videos based on audios of the speaker and reference images of the listener. Compared to the Talking-head generation, it is more challenging to capture the correlation clues from the speaker's audio and visual information. Following the ViCo baseline scheme, we propose a high-performance solution by enhancing the hierarchical semantic extraction capability of the audio encoder module and improving the decoder part, renderer and post-processing modules. Our solution gets the first place on the official leaderboard for the track of listening head generation. This paper is a technical report of ViCo@2023 Conversational Head Generation Challenge in ACM Multimedia 2023 conference.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhigang Chang (8 papers)
  2. Weitai Hu (2 papers)
  3. Qing Yang (138 papers)
  4. Shibao Zheng (21 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.