Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Super Resolution For One-Shot Talking-Head Generation (2403.15944v1)

Published 23 Mar 2024 in cs.CV, cs.AI, and eess.IV

Abstract: The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity of the synthesized images. Some methods try to improve the quality of synthesized videos by introducing additional super-resolution modules, but this will undoubtedly increase computational consumption and destroy the original data distribution. In this work, we propose an adaptive high-quality talking-head video generation method, which synthesizes high-resolution video without additional pre-trained modules. Specifically, inspired by existing super-resolution methods, we down-sample the one-shot source image, and then adaptively reconstruct high-frequency details via an encoder-decoder module, resulting in enhanced video clarity. Our method consistently improves the quality of generated videos through a straightforward yet effective strategy, substantiated by quantitative and qualitative evaluations. The code and demo video are available on: \url{https://github.com/Songluchuan/AdaSR-TalkingHead/}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “Head2head: Video-based neural head synthesis,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 2020, pp. 16–23.
  2. “Everybody’s talkin’: Let me talk as you want,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 585–598, 2022.
  3. “Tacr-net: editing on deep video and voice portraits,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 478–486.
  4. “Talking face video generation with editable expression,” in Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part III 11. Springer, 2021, pp. 753–764.
  5. “First order motion model for image animation,” Advances in neural information processing systems, vol. 32, 2019.
  6. “Thin-plate spline motion model for image animation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3657–3666.
  7. “One-shot free-view neural talking-head synthesis for video conferencing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10039–10049.
  8. “Emotional listener portrait: Neural listener head generation with emotion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20839–20849.
  9. “Fsft-net: face transfer video generation with few-shot views,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 3582–3586.
  10. “Voxceleb: a large-scale speaker identification dataset,” arXiv preprint arXiv:1706.08612, 2017.
  11. “Voxceleb2: Deep speaker recognition,” arXiv preprint arXiv:1806.05622, 2018.
  12. “Metaportrait: Identity-preserving talking head generation with fast personalized adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22096–22105.
  13. “Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8652–8661.
  14. “Videoretalking: Audio-based lip synchronization for talking head video editing in the wild,” in SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9.
  15. “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European conference on computer vision (ECCV) workshops, 2018, pp. 0–0.
  16. “Real-esrgan: Training real-world blind super-resolution with pure synthetic data,” in International Conference on Computer Vision Workshops (ICCVW).
  17. “Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3661–3670.
  18. “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  19. “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  20. “Few-shot video-to-video synthesis,” arXiv preprint arXiv:1910.12713, 2019.
  21. “Video-to-video synthesis,” arXiv preprint arXiv:1808.06601, 2018.
  22. “Celebv-hq: A large-scale video facial attributes dataset,” in European conference on computer vision. Springer, 2022, pp. 650–667.
  23. “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
  24. “Fine-grained head pose estimation without keypoints,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018.
Citations (4)

Summary

We haven't generated a summary for this paper yet.