Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation (2312.02605v1)
Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/
- B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
- D. Ma, F. Zhang, and D. R. Bull, “MFRNet: a new CNN architecture for post-processing and in-loop filtering,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 378–387, 2020.
- C. Feng, D. Danier, C. Tan, F. Zhang, and D. Bull, “ViSTRA3: Video coding with deep parameter adaptation and post processing,” in 2022 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2022, pp. 824–828.
- D. Ma, M. Afonso, F. Zhang, and D. R. Bull, “Perceptually-inspired super-resolution of compressed videos,” in Applications of Digital Image Processing XLII, vol. 11137. SPIE, 2019, pp. 310–318.
- G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “DVC: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 006–11 015.
- Z. Hu, G. Lu, and D. Xu, “FVC: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502–1511.
- J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- G. Gao, P. You, R. Pan, S. Han, Y. Zhang, Y. Dai, and H. Lee, “Neural image compression via attentional multi-scale back projection and frequency decomposition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 677–14 686.
- J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1503–1511.
- ——, “Neural video compression with diverse contexts,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023, 2023.
- H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, “HiNeRV: Video Compression with Hierarchical Encoding based Neural Representation,” arXiv preprint arXiv:2306.09818, 2023.
- G.-H. Wang, J. Li, B. Li, and Y. Lu, “EVC: Towards real-time neural image compression with mask decay,” arXiv preprint arXiv:2302.05071, 2023.
- A. Luo, H. Sun, J. Liu, and J. Katto, “Memory-efficient learned image compression with pruned hyperprior module,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 3061–3065.
- S. Yin, C. Li, F. Meng, W. Tan, Y. Bao, Y. Liang, and W. Liu, “Exploring structural sparsity in neural image compression,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 471–475.
- H. Sun, L. Yu, and J. Katto, “Q-LIC: Quantizing learned image compression with channel splitting,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- J.-H. Kim, J.-H. Choi, J. Chang, and J.-S. Lee, “Efficient deep learning-based lossy image compression via asymmetric autoencoder and pruning,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 2063–2067.
- Z. Hu and D. Xu, “Complexity-guided slimmable decoder for efficient deep video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 358–14 367.
- Z. Liu, L. Herranz, F. Yang, S. Zhang, S. Wan, M. Mrak, and M. G. Blanch, “Slimmable video codec,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1743–1747.
- Y. Li, K. Adamczewski, W. Li, S. Gu, R. Timofte, and L. Van Gool, “Revisiting random channel pruning for neural network compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 191–201.
- A. I. Nowak, B. Grooten, D. C. Mocanu, and J. Tabor, “Fantastic weights and how to find them: Where to prune in dynamic sparse training,” arXiv preprint arXiv:2306.12230, 2023.
- Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
- I. Lazarevich, A. Kozlov, and N. Malinin, “Post-training deep neural network pruning via layer-wise calibration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 798–805.
- T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, pp. 1106–1125, 2019.
- X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video compression,” IEEE Transactions on Multimedia, 2022.
- F. Mentzer, G. Toderici, D. Minnen, S.-J. Hwang, S. Caelles, M. Lucic, and E. Agustsson, “VCT: A video compression transformer,” arXiv preprint arXiv:2206.07307, 2022.
- A. Mercat, M. Viitanen, and J. Vanne, “UVG Dataset: 50/120fps 4K Sequences for Video Codec Analysis and Development,” in MMSys. ACM, 2020, pp. 297–302.
- H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C. J. Kuo, “MCL-JCV: A JND-based H.264/AVC video quality assessment dataset,” in ICIP. IEEE, 2016, pp. 1509–1513.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.