Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos (2008.01652v1)

Published 2 Aug 2020 in cs.CV, cs.MM, and eess.IV

Abstract: We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yanhui Guo (21 papers)
  2. Xi Zhang (303 papers)
  3. Xiaolin Wu (40 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.