NERV++: An Enhanced Implicit Neural Video Representation (2402.18305v1)
Abstract: Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.
- “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
- “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
- “Learned video compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3454–3463.
- “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006–11015.
- “Vct: A video compression transformer,” arXiv preprint arXiv:2206.07307, 2022.
- “Neural inter-frame compression for video coding,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6421–6429.
- “Scale-space flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
- Iain E Richardson, The H. 264 advanced video compression standard, John Wiley & Sons, 2011.
- “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
- “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174.
- “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- “imap: Implicit mapping and positioning in real-time,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6229–6238.
- “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12786–12796.
- “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7210–7219.
- “Gnerf: Gan-based neural radiance field without posed camera,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6351–6361.
- “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
- “Nerf++: Analyzing and improving neural radiance fields,” arXiv preprint arXiv:2010.07492, 2020.
- “Implicit neural representations with periodic activation functions,” Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473, 2020.
- “Coin: Compression with implicit neural representations,” arXiv preprint arXiv:2103.03123, 2021.
- “Implicit neural representations for image compression,” in European Conference on Computer Vision. Springer, 2022, pp. 74–91.
- “Coin++: Neural compression across modalities,” arXiv preprint arXiv:2201.12904, 2022.
- “Nerv: Neural representations for videos,” Advances in Neural Information Processing Systems, vol. 34, pp. 21557–21568, 2021.
- “E-nerv: Expedite neural video representation with disentangled spatial-temporal context,” in European Conference on Computer Vision. Springer, 2022, pp. 267–284.
- “Ps-nerv: Patch-wise stylized neural representations for videos,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 41–45.
- “Ffnerv: Flow-guided frame-wise neural representations for videos,” arXiv preprint arXiv:2212.12294, 2022.
- “Hnerv: A hybrid neural representation for videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10270–10279.
- “Hinerv: Video compression with hierarchical encoding based neural representation,” arXiv preprint arXiv:2306.09818, 2023.
- “Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14378–14387.
- “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
- Pruning Tutorial, “Tutorial for l1 unstructured pruning. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html,” .
- Alistair Moffat, “Huffman coding,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–35, 2019.
- “Arithmetic coding,” IBM Journal of research and development, vol. 23, no. 2, pp. 149–162, 1979.
- “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
- “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 1509–1513.
- Bunny scikit video, “scikit-video: video processing in python— scikit-video 1.0.0 documentation. http://www.scikit-video.org/stable/datasets.html,” .
- Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
- Ahmed Ghorbel (6 papers)
- Wassim Hamidouche (62 papers)
- Luce Morin (12 papers)