Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer (2404.13640v1)

Published 21 Apr 2024 in cs.MM, cs.CV, and eess.IV

Abstract: Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and artifacts from pose changes and keypoint localization errors. To address this, we propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. PGTFormer leverages semantic parsing guidance to select optimal face priors for generating temporally coherent artifact-free results. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. Then, the temporal parse-guided codebook predictor (TPCP) restores faces in different poses based on face parsing context cues without performing face pre-alignment. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment. Finally, the temporal fidelity regulator (TFR) enhances fidelity through temporal feature interaction and improves video temporal consistency. Extensive experiments on face videos show that our method outperforms previous face restoration baselines. The code will be released on \href{https://github.com/kepengxu/PGTFormer}{https://github.com/kepengxu/PGTFormer}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Glean: Generative latent bank for large-factor image super-resolution. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
  2. Basicvsr: The search for essential components in video super-resolution and beyond. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
  3. Basicvsr++: Improving video super-resolution with enhanced propagation and alignment. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022.
  4. Fsrnet: End-to-end learning face super-resolution with facial priors. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018.
  5. Coarse-to-fine attention network via opinion approximate representation for aspect-level sentiment classification. pages 704–715, 2020.
  6. Progressive semantic-aware style transformation for blind face restoration. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
  7. A hierarchical interactive network for joint span-based aspect-sentiment analysis. pages 7013–7019, 2022.
  8. Modeling adaptive inter-task feature interactions via sentiment-aware contrastive learning for joint aspect-sentiment prediction. 38:17781–17789, 2024.
  9. Video demoireing with relation-based temporal consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17622–17631, 2022.
  10. Arcface: Additive angular margin loss for deep face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1–1, Jan 2021.
  11. Image processing using multi-code gan prior. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020.
  12. Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In ECCV, 2022.
  13. Mfqe 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence, page 949–963, Mar 2021.
  14. Sdrtv-to-hdrtv via hierarchical dynamic context feature mapping. page 2890–2898, 2022.
  15. Maskgan: Towards diverse and interactive facial image manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  16. Interactive separation network for image inpainting. pages 1008–1012, 2020.
  17. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022.
  18. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020.
  19. x264: A high performance h. 264/avc encoder. online http://neuron2. net/library/avc/overview_x264_v8_5. pdf, 2006.
  20. Face image set classification with self-weighted latent sparse discriminative learning. Neural Computing and Applications, pages 1–13, 2020.
  21. Edvr: Video restoration with enhanced deformable convolutional networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun 2019.
  22. Adaptive wing loss for robust face alignment via heatmap regression. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2019.
  23. Towards real-world blind face restoration with generative facial prior. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
  24. Selfpromer: Self-prompt dehazing transformers with depth-consistency. 38:5327–5335, 2024.
  25. Promptrestorer: A prompting image restoration method with degradation perception. Advances in Neural Information Processing Systems, 36, 2024.
  26. Correlation matching transformation transformers for uhd image restoration. 38:5336–5344, 2024.
  27. Patchhar: A mlp-like architecture for efficient activity recognition using wearables. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2024.
  28. Vfhq: A high-quality dataset and benchmark for video face super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.
  29. Sdrtv-to-hdrtv conversion via spatial-temporal feature fusion. arXiv preprint arXiv:2211.02297, 2022.
  30. Transcoded video restoration by temporal spatial auxiliary network. 36(3):2875–2883, 2022.
  31. Towards robust sdrtv-to-hdrtv via dual inverse degradation network. arXiv preprint arXiv:2307.03394, 2023.
  32. Multi-frame quality enhancement for compressed video. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018.
  33. Gan prior embedded network for blind face restoration in the wild. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
  34. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129:3051–3068, 2021.
  35. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018.
  36. Superyolo: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2023.
  37. Guided hybrid quantization for object detection in remote sensing imagery via one-to-one self-teaching. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  38. Efficientmfd: Towards more efficient multimodal synchronous fusion detection. arXiv preprint arXiv:2403.09323, 2024.
  39. Towards robust blind face restoration with codebook lookup transformer. NeurIPS, Jun 2022.
Citations (6)

Summary

We haven't generated a summary for this paper yet.