Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NERV++: An Enhanced Implicit Neural Video Representation (2402.18305v1)

Published 28 Feb 2024 in eess.IV and cs.CV

Abstract: Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  2. “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
  3. “Learned video compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3454–3463.
  4. “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006–11015.
  5. “Vct: A video compression transformer,” arXiv preprint arXiv:2206.07307, 2022.
  6. “Neural inter-frame compression for video coding,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6421–6429.
  7. “Scale-space flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
  8. Iain E Richardson, The H. 264 advanced video compression standard, John Wiley & Sons, 2011.
  9. “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  10. “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174.
  11. “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  12. “imap: Implicit mapping and positioning in real-time,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6229–6238.
  13. “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12786–12796.
  14. “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7210–7219.
  15. “Gnerf: Gan-based neural radiance field without posed camera,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6351–6361.
  16. “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
  17. “Nerf++: Analyzing and improving neural radiance fields,” arXiv preprint arXiv:2010.07492, 2020.
  18. “Implicit neural representations with periodic activation functions,” Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473, 2020.
  19. “Coin: Compression with implicit neural representations,” arXiv preprint arXiv:2103.03123, 2021.
  20. “Implicit neural representations for image compression,” in European Conference on Computer Vision. Springer, 2022, pp. 74–91.
  21. “Coin++: Neural compression across modalities,” arXiv preprint arXiv:2201.12904, 2022.
  22. “Nerv: Neural representations for videos,” Advances in Neural Information Processing Systems, vol. 34, pp. 21557–21568, 2021.
  23. “E-nerv: Expedite neural video representation with disentangled spatial-temporal context,” in European Conference on Computer Vision. Springer, 2022, pp. 267–284.
  24. “Ps-nerv: Patch-wise stylized neural representations for videos,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 41–45.
  25. “Ffnerv: Flow-guided frame-wise neural representations for videos,” arXiv preprint arXiv:2212.12294, 2022.
  26. “Hnerv: A hybrid neural representation for videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10270–10279.
  27. “Hinerv: Video compression with hierarchical encoding based neural representation,” arXiv preprint arXiv:2306.09818, 2023.
  28. “Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14378–14387.
  29. “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
  30. Pruning Tutorial, “Tutorial for l1 unstructured pruning. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html,” .
  31. Alistair Moffat, “Huffman coding,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–35, 2019.
  32. “Arithmetic coding,” IBM Journal of research and development, vol. 23, no. 2, pp. 149–162, 1979.
  33. “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
  34. “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 1509–1513.
  35. Bunny scikit video, “scikit-video: video processing in python— scikit-video 1.0.0 documentation. http://www.scikit-video.org/stable/datasets.html,” .
  36. Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ahmed Ghorbel (6 papers)
  2. Wassim Hamidouche (62 papers)
  3. Luce Morin (12 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com