Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs (2403.18535v1)
Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
- “The devil is in the details: Window-based attention for image compression,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17492–17501, June 2022.
- “ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5718–5727, June 2022.
- “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7936–7945, June 2020.
- “Qarv: Quantization-aware resnet vae for lossy image compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 436–450, 2024.
- “Learned image compression with mixed transformer-cnn architectures,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14388–14397, June 2023.
- “Another way to the top: Exploit contextual clustering in learned image coding,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 9377–9386, Mar. 2024.
- “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
- “Towards empirical sandwich bounds on the rate-distortion function,” International Conference on Learning Representations, Apr. 2022.
- “An improved upper bound on the rate-distortion function of images,” pp. 246–250, 2023.
- D. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning Representations, Apr. 2014.
- “Knowledge Distillation: A Survey,” International Journal of Computer Vision, vol. 129, pp. 1789–1819, 2021.
- “End-to-end optimized image compression,” International Conference on Learning Representations, Apr. 2017.
- “Variational image compression with a scale hyperprior,” International Conference on Learning Representations, Apr. 2018.
- “High-efficiency lossy image coding through adaptive neighborhood information aggregation,” arXiv preprint arXiv:2204.11448, Oct. 2022.
- “A reconfigurable framework for neural network-based video in-loop filtering,” ACM Transactions on Multimedia Computing, Communications and Applications, 2024.
- “Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11013–11020, 2020.
- “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, pp. 10794–10803, Dec. 2018.
- “Improved variational inference with inverse autoregressive flow,” Advances in Neural Information Processing Systems, vol. 29, Dec. 2016.
- “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv:1503.02531, 2015.
- “Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation,” arXiv preprint arXiv:2309.02529, 2023.
- “Fakd: Feature-Affinity Based Knowledge Distillation for Efficient Image Super-Resolution,” 2020 IEEE International Conference on Image Processing, pp. 518–522, 2020.
- “Improved feature distillation via projector ensemble,” Advances in Neural Information Processing Systems, vol. 35, pp. 12084–12095, 2022.
- “CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366, 2021.
- “Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice,” arXiv preprint arXiv:2203.05962, 2022.
- “A ConvNet for the 2020s,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
- “Microsoft COCO: Common Objects in Context,” European Conference on Computer Vision, pp. 740–755, 2014.
- Yichi Zhang (184 papers)
- Zhihao Duan (38 papers)
- Yuning Huang (11 papers)
- Fengqing Zhu (77 papers)