Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Neural Representations for Videos with a Conditional Decoder (2402.18152v3)

Published 28 Feb 2024 in eess.IV, cs.AI, and cs.CV

Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs. Code is available at https://github.com/Xinjie-Q/Boosting-NeRV.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Ps-nerv: Patch-wise stylized neural representations for videos. In 2023 IEEE International Conference on Image Processing (ICIP), pages 41–45. IEEE, 2023.
  2. End-to-end optimized image compression. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
  3. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  4. Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 696–697, 2020.
  5. Nerv: Neural representations for videos. Advances in Neural Information Processing Systems, 34:21557–21568, 2021.
  6. Cnerv: Content-adaptive neural representation for visual data. In British Machine Vision Conference, 2022.
  7. Hnerv: A hybrid neural representation for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10270–10279, 2023.
  8. A learned representation for artistic style. In International Conference on Learning Representations, 2016.
  9. Learned step size quantization. In International Conference on Learning Representations, 2019.
  10. Exploring the structure of a real-time, arbitrary neural artistic stylization network. In British Machine Vision Conference, 2017.
  11. Video compression with entropy-constrained neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18497–18506, 2023.
  12. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 30(11):3911–3927, 2019.
  13. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
  14. Transfer learning from synthetic to real-noise denoising with adaptive instance normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3482–3492, 2020.
  15. Janus B. Kristensen. Big buck bunny. 2010.
  16. Hinerv: Video compression with hierarchical encoding based neural representation. arXiv preprint arXiv:2306.09818, 2023.
  17. Ffnerv: Flow-guided frame-wise neural representations for videos. In Proceedings of the ACM International Conference on Multimedia, 2023.
  18. Deep contextual video compression. Advances in Neural Information Processing Systems, 34:18114–18125, 2021.
  19. E-nerv: Expedite neural video representation with disentangled spatial-temporal context. In European Conference on Computer Vision, pages 267–284. Springer, 2022.
  20. Regularize implicit neural representation by itself. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10280–10288, 2023.
  21. M-lvc: Multiple frames prediction for learned video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3546–3554, 2020.
  22. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  23. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  24. Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14378–14387, 2023.
  25. Uvg dataset: 50/120fps 4k sequences for video codec analysis and development. In Proceedings of the 11th ACM Multimedia Systems Conference, pages 297–302, 2020.
  26. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  27. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  28. Scalable model compression by entropy penalized reparameterization. In International Conference on Learning Representations, 2019.
  29. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  30. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2337–2346, 2019.
  31. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732, 2016.
  32. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  33. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  34. Wire: Wavelet implicit neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18507–18516, 2023.
  35. Temporal context mining for learned video compression. IEEE Transactions on Multimedia, 2022.
  36. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
  37. Adversarial generation of continuous images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10753–10764, 2021.
  38. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3626–3636, 2022.
  39. Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2380–2389, 2021.
  40. Implicit neural representations for image compression. In European Conference on Computer Vision, pages 74–91. Springer, 2022.
  41. Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology, 22(12):1649–1668, 2012.
  42. Hypersound: Generating implicit neural representations of audio signals with hypernetworks. arXiv preprint arXiv:2211.01839, 2022.
  43. Scene matters: Model-based deep video compression. In Proceedings of the IEEE international conference on computer vision, 2023.
  44. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 606–615, 2018.
  45. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9168–9178, 2021.
  46. Overview of the h. 264/avc video coding standard. IEEE Transactions on circuits and systems for video technology, 13(7):560–576, 2003.
  47. Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models. In Has it Trained Yet? NeurIPS 2022 Workshop, 2022.
  48. Revisiting implicit neural representations in low-level vision. 2023.
  49. Sci: A spectrum concentrated implicit neural compression for biomedical data. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4774–4782, 2023.
  50. Implicit neural video compression. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022.
  51. Dnerv: Modeling inherent dynamics via difference neural representation for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2031–2040, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xinjie Zhang (27 papers)
  2. Ren Yang (25 papers)
  3. Dailan He (25 papers)
  4. Xingtong Ge (9 papers)
  5. Tongda Xu (31 papers)
  6. Yan Wang (734 papers)
  7. Hongwei Qin (38 papers)
  8. Jun Zhang (1008 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.