Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution (2403.17000v1)

Published 25 Mar 2024 in cs.CV and cs.MM

Abstract: Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo), for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE. SFA modulates frame features via adaptively estimating affine parameters for each pixel, guaranteeing pixel-wise guidance for high-resolution frame synthesis. TFA delves into feature interaction within a 3D local window (tubelet) through self-attention, and executes cross-attention between tubelet and its low-resolution counterpart to guide temporal feature alignment. Extensive experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Real-time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation. In CVPR, 2017.
  2. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. In CVPR, 2021.
  3. BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment. In CVPR, 2022a.
  4. Investigating Tradeoffs in Real-World Video Super-Resolution. In CVPR, 2022b.
  5. Two Deterministic Half-quadratic Regularization Algorithms for Computed Imaging. In ICIP, 1994.
  6. AnchorFormer: Point Cloud Completion from Discriminative Nodes. In CVPR, 2023.
  7. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. In ICCV, 2021.
  8. Perception Prioritized Training of Diffusion Models. In CVPR, 2022.
  9. Improving Diffusion Dodels for Inverse Problems Using Manifold Constraints. In NeurIPS, 2022.
  10. Diffusion Posterior Sampling for General Noisy Inverse Problems. In ICLR, 2023.
  11. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS, 2021.
  12. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE TPAMI, 2020.
  13. Generative Diffusion Prior for Unified Image Restoration and Enhancement. In CVPR, 2023.
  14. RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution. In CVPR, 2022.
  15. Recurrent Back-Projection Network for Video Super-Resolution. In CVPR, 2019.
  16. Prompt-to-Prompt Image Editing with Cross-Attention Control. In ICLR, 2023.
  17. Video Super-Resolution via Bidirectional Recurrent Convolutional Networks. IEEE TPAMI, 2017.
  18. Video Super-Resolution with Recurrent Structure-Detail Network. In ECCV, 2020a.
  19. Video Super-resolution with Temporal Group Attention. In CVPR, 2020b.
  20. Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation. In CVPR, 2018.
  21. Denoising Diffusion Restoration Models. In NeurIPS, 2022.
  22. MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution. In ECCV, 2020.
  23. VRT: A Video Restoration Transformer. arXiv:2201.12288, 2022a.
  24. Recurrent Video Restoration Transformer with Guided Deformable Attention. In NeurIPS, 2022b.
  25. On Bayesian Adaptive Video Super Resolution. IEEE TPAMI, 2013.
  26. Learning Trajectory-Aware Transformer for Video Super-Resolution. In CVPR, 2022.
  27. Stand-Alone Inter-Frame Attention in Video Models. In CVPR, 2022a.
  28. Dynamic Temporal Filtering in Video Models. In ECCV, 2022b.
  29. PointClustering: Unsupervised Point Cloud Pre-training using Transformation Invariance in Clustering. In CVPR, 2023.
  30. VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM. arXiv:2401.01256, 2024.
  31. Diffusion Model Based Posterior Sampling for Noisy Linear Inverse Problems. arXiv:2211.12343, 2022.
  32. Making a “Completely Blind” Image Quality Analyzer. IEEE SPL, 2012.
  33. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In CVPRW, 2019.
  34. Glide: Towards Photorealistic Image Generation and Editing with Text-guided Diffusion Models. In ICML, 2022.
  35. Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
  36. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR, 2022.
  37. Palette: Image-to-Image Diffusion Models. In ACM SIGGRAPH, 2022.
  38. Frame-Recurrent Video Super-Resolution. In CVPR, 2018.
  39. Rethinking Alignment in Video Super-Resolution Transformers. In NeurIPS, 2022.
  40. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In CVPR, 2016.
  41. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR, 2015.
  42. Pseudoinverse-guided Diffusion Models for Inverse Problems. In ICLR, 2022.
  43. TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution. In CVPR, 2020.
  44. Diffusers: State-of-the-art Diffusion Models, 2022.
  45. Exploring CLIP for Assessing the Look and Feel of Images. In AAAI, 2023a.
  46. Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv:2305.07015, 2023b.
  47. Deep Video Super-Resolution using HR Optical Flow Estimation. IEEE TIP, 2020.
  48. EDVR: Video Restoration with Enhanced Deformable Convolutional Networks. In CVPRW, 2019.
  49. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model. In ICLR, 2023c.
  50. Temporal Modulation Network for Controllable Space-Time Video Super-Resolution. In CVPR, 2021.
  51. Video Enhancement with Task-Oriented Flow. IJCV, 2019.
  52. Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization. arXiv:2308.14469, 2023.
  53. Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations. In ICCV, 2019.
  54. Omniscient Video Super-Resolution. In ICCV, 2021.
  55. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV, 2023.
  56. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR, 2018.
  57. Denoising Diffusion Models for Plug-and-Play Image Restoration. In CVPRW, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhikai Chen (20 papers)
  2. Fuchen Long (13 papers)
  3. Zhaofan Qiu (37 papers)
  4. Ting Yao (127 papers)
  5. Wengang Zhou (153 papers)
  6. Jiebo Luo (355 papers)
  7. Tao Mei (209 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.