Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression (2405.04274v2)
Abstract: Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few frames can lead to significant errors accumulating over time; 2), NVC frameworks are generally more complex, with many large components that are not easy to update quickly during encoding. To address the previously mentioned challenges, we have developed a content-adaptive NVC technique called Group-aware Parameter-Efficient Updating (GPU). Initially, to minimize error accumulation, we adopt a group-aware approach for updating encoder parameters. This involves adopting a patch-based Group of Pictures (GoP) training strategy to segment a video into patch-based GoPs, which will be updated to facilitate a globally optimized domain-transferable solution. Subsequently, we introduce a parameter-efficient delta-tuning strategy, which is achieved by integrating several light-weight adapters into each coding component of the encoding process by both serial and parallel configuration. Such architecture-agnostic modules stimulate the components with large parameters, thereby reducing both the update cost and the encoding time. We incorporate our GPU into the latest NVC framework and conduct comprehensive experiments, whose results showcase outstanding video compression efficiency across four video benchmarks and adaptability of one medical image benchmark.
- [n. d.]. Hevc test model (hm). https://hevc.hhi.fraunhofer.de/HM-doc/. Accessed: 2024-03-06.
- [n. d.]. VVC Reference Model (VTM). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/. Accessed: 2024-03-06.
- Gop-based latent refinement for learned video coding. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- Scale-Space Flow for End-to-End Optimized Video Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8503–8512.
- Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging 37, 11 (2018), 2514–2525.
- Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736–3764.
- Content adaptive optimization for neural image compression. arXiv preprint arXiv:1906.01223 (2019).
- Exploiting intra-slice and inter-slice redundancy for learning-based lossless volumetric image compression. IEEE Transactions on Image Processing 31 (2022), 1697–1707.
- LSVC: A learning-based stereo video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6073–6082.
- Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers. In Proceedings of the 31th ACM International Conference on Multimedia. ACM. https://doi.org/10.1145/3581783.3611960
- Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5, 3 (2023), 220–235.
- Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022).
- Flexible neural image compression via code editing. Advances in Neural Information Processing Systems 35 (2022), 12184–12196.
- Video compression with rate-distortion autoencoders. In Proceedings of the IEEE International Conference on Computer Vision. 7033–7042.
- Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021).
- Sparseadapter: An easy approach for improving the parameter-efficiency of adapters. arXiv preprint arXiv:2210.04284 (2022).
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Zhihao Hu. 2020. DVC-2.0. In European Conference on Computer Vision. Springer, 193–209.
- Improving deep video compression by resolution-adaptive flow coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 193–209.
- Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- FVC: A New Framework towards Deep Video Compression in Feature Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1502–1511.
- Improving the reconstruction quality by overfitted decoder bias in neural image compression. In 2022 Picture Coding Symposium (PCS). IEEE, 61–65.
- Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems 34 (2021), 1022–1035.
- Efficient adaptation of neural network filter for video compression. In Proceedings of the 28th ACM International Conference on Multimedia. 358–366.
- Deep contextual video compression. Advances in Neural Information Processing Systems 34 (2021), 18114–18125.
- Hybrid spatial-temporal entropy modelling for neural video compression. In Proceedings of the 30th ACM International Conference on Multimedia. 1503–1511.
- Neural Video Compression with Diverse Contexts. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023.
- Neural Video Compression with Feature Modulation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 17-21, 2024.
- M-LVC: Multiple Frames Prediction for Learned Video Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3546–3554.
- ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision. (2023). https://doi.org/10.1145/3581783.3612041
- Deep generative video compression. In Advances in Neural Information Processing Systems. 9287–9298.
- Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision. Springer, 456–472.
- DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11006–11015.
- An end-to-end learning framework for video compression. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3292–3308.
- VCT: A Video Compression Transformer. Advances in Neural Information Processing Systems 35 (2022), 13091–13103.
- Content adaptive latents and decoder for neural image compression. In European Conference on Computer Vision. Springer, 556–573.
- Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779 (2020).
- Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4161–4170.
- Dec-adapter: Exploring efficient decoder-side adapter for bridging screen content and natural image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12887–12896.
- Temporal context mining for learned video compression. IEEE Transactions on Multimedia (2022).
- Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649–1668.
- Offline and Online Optical Flow Enhancement for Deep Video Compression. arXiv preprint arXiv:2307.05092 (2023).
- Universal deep image compression via content-adaptive optimization with adapters. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2529–2538.
- Instance-adaptive video compression: Improving neural codecs by training on the test set. arXiv preprint arXiv:2111.10302 (2021).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Neural data-dependent transform for learned image compression. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 17379–17388.
- Bit allocation using optimization. In International Conference on Machine Learning. PMLR, 38377–38399.
- Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.
- Learning for video compression with hierarchical quality and recurrent enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6628–6637.
- Improving inference for neural image compression. Advances in Neural Information Processing Systems 33 (2020), 573–584.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
- A universal encoder rate distortion optimization framework for learned compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1880–1884.
- Counter-interference adapter for multilingual machine translation. arXiv preprint arXiv:2104.08154 (2021).
- Transformer-based transform coding. In International Conference on Learning Representations.
- Adaptation and attention for neural video coding. In 2021 IEEE International Symposium on Multimedia (ISM). IEEE, 240–244.
- Zhenghao Chen (30 papers)
- Luping Zhou (72 papers)
- Zhihao Hu (16 papers)
- Dong Xu (167 papers)