Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs (2403.18535v1)

Published 27 Mar 2024 in eess.IV and cs.LG

Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “The devil is in the details: Window-based attention for image compression,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17492–17501, June 2022.
  2. “ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5718–5727, June 2022.
  3. “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7936–7945, June 2020.
  4. “Qarv: Quantization-aware resnet vae for lossy image compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 436–450, 2024.
  5. “Learned image compression with mixed transformer-cnn architectures,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14388–14397, June 2023.
  6. “Another way to the top: Exploit contextual clustering in learned image coding,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 9377–9386, Mar. 2024.
  7. “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
  8. “Towards empirical sandwich bounds on the rate-distortion function,” International Conference on Learning Representations, Apr. 2022.
  9. “An improved upper bound on the rate-distortion function of images,” pp. 246–250, 2023.
  10. D. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning Representations, Apr. 2014.
  11. “Knowledge Distillation: A Survey,” International Journal of Computer Vision, vol. 129, pp. 1789–1819, 2021.
  12. “End-to-end optimized image compression,” International Conference on Learning Representations, Apr. 2017.
  13. “Variational image compression with a scale hyperprior,” International Conference on Learning Representations, Apr. 2018.
  14. “High-efficiency lossy image coding through adaptive neighborhood information aggregation,” arXiv preprint arXiv:2204.11448, Oct. 2022.
  15. “A reconfigurable framework for neural network-based video in-loop filtering,” ACM Transactions on Multimedia Computing, Communications and Applications, 2024.
  16. “Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11013–11020, 2020.
  17. “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, pp. 10794–10803, Dec. 2018.
  18. “Improved variational inference with inverse autoregressive flow,” Advances in Neural Information Processing Systems, vol. 29, Dec. 2016.
  19. “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv:1503.02531, 2015.
  20. “Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation,” arXiv preprint arXiv:2309.02529, 2023.
  21. “Fakd: Feature-Affinity Based Knowledge Distillation for Efficient Image Super-Resolution,” 2020 IEEE International Conference on Image Processing, pp. 518–522, 2020.
  22. “Improved feature distillation via projector ensemble,” Advances in Neural Information Processing Systems, vol. 35, pp. 12084–12095, 2022.
  23. “CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366, 2021.
  24. “Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice,” arXiv preprint arXiv:2203.05962, 2022.
  25. “A ConvNet for the 2020s,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
  26. “Microsoft COCO: Common Objects in Context,” European Conference on Computer Vision, pp. 740–755, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yichi Zhang (184 papers)
  2. Zhihao Duan (38 papers)
  3. Yuning Huang (11 papers)
  4. Fengqing Zhu (77 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.