GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression (2405.01170v1)
Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.
- G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar, “Variable rate image compression with recurrent neural networks,” arXiv preprint arXiv:1511.06085, 2015.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
- Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7939–7948.
- F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-end compression framework based on convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 3007–3018, 2018.
- D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, vol. 31, 2018.
- D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard Context Model for Efficient Learned Image Compression,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 14 766–14 775.
- D. Minnen and S. Singh, “Channel-Wise Autoregressive Entropy Models for Learned Image Compression,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3339–3343.
- D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 5708–5717.
- Y. Qian, X. Sun, M. Lin, Z. Tan, and R. Jin, “Entroformer: A transformer-based entropy model for learned image compression,” in International Conference on Learning Representations, 2021.
- A. B. Koyuncu, H. Gao, A. Boev, G. Gaikov, E. Alshina, and E. Steinbach, “Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression,” in European Conference on Computer Vision. Springer, 2022, pp. 447–463.
- J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformer-cnn architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 388–14 397.
- W. Jiang, J. Yang, Y. Zhai, P. Ning, F. Gao, and R. Wang, “Mlic: Multi-reference entropy model for learned image compression,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7618–7627.
- W. Jiang and R. Wang, “MLIC$^{++}$: Linear complexity multi-reference entropy modeling for learned image compression,” in ICML 2023 Workshop Neural Compression: From Information Theory to Applications, 2023. [Online]. Available: https://openreview.net/forum?id=hxIpcSoz2t
- Y. Hu, W. Yang, Z. Ma, and J. Liu, “Learning end-to-end lossy image compression: A benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4194–4211, 2021.
- Y. Qian, Z. Tan, X. Sun, M. Lin, D. Li, Z. Sun, L. Hao, and R. Jin, “Learning accurate entropy model with global reference for image compression,” in International Conference on Learning Representations, 2020.
- G. Gao, P. You, R. Pan, S. Han, Y. Zhang, Y. Dai, and H. Lee, “Neural image compression via attentional multi-scale back projection and frequency decomposition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 677–14 686.
- Y. Wu, X. Li, Z. Zhang, X. Jin, and Z. Chen, “Learned block-based hybrid image compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 6, pp. 3978–3990, 2021.
- Y. Xie, K. L. Cheng, and Q. Chen, “Enhanced invertible encoding for learned image compression,” in Proceedings of the 29th ACM international conference on multimedia, 2021, pp. 162–170.
- L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” in International C/ballonference on Learning Representations, 2016.
- Z. Guo, Z. Zhang, R. Feng, and Z. Chen, “Causal contextual prediction for learned image compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2329–2341, 2021.
- Y. Bao, F. Meng, C. Li, S. Ma, Y. Tian, and Y. Liang, “Nonlinear Transforms in Learned Image Compression From a Communication Perspective,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1922–1936, Apr. 2023.
- Y. Wang, D. Liu, S. Ma, F. Wu, and W. Gao, “Ensemble Learning-Based Rate-Distortion Optimization for End-to-End Image Compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1193–1207, Mar. 2021.
- R. Zou, C. Song, and Z. Zhang, “The Devil Is in the Details: Window-based Attention for Image Compression,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 17 471–17 480.
- B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the Versatile Video Coding (VVC) Standard and its Applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, Oct. 2021.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- Y. Bai, X. Yang, X. Liu, J. Jiang, Y. Wang, X. Ji, and W. Gao, “Towards end-to-end image compression and analysis with transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 104–112.
- Y. Bai, X. Liu, K. Wang, X. Ji, X. Wu, and W. Gao, “Deep lossy plus residual coding for lossless and near-lossless image compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3577–3594, 2024.
- F. Kingma, P. Abbeel, and J. Ho, “Bit-swap: Recursive bits-back coding for lossless compression with hierarchical latent variables,” in International Conference on Machine Learning. PMLR, 2019, pp. 3408–3417.
- J. Townsend, T. Bird, and D. Barber, “Practical lossless compression with latent variables using bits back coding,” in 7th International Conference on Learning Representations, ICLR 2019, vol. 7. International Conference on Learning Representations (ICLR), 2019.
- S. Zhang, N. Kang, T. Ryder, and Z. Li, “iflow: Numerically invertible flows for efficient lossless compression via a uniform coder,” Advances in Neural Information Processing Systems, vol. 34, pp. 5822–5833, 2021.
- Y. Bai, X. Liu, W. Zuo, Y. Wang, and X. Ji, “Learning scalable ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-constrained near-lossless image compression via joint lossy image and residual compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 11 946–11 955.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- X. Chu, Z. Tian, B. Zhang, X. Wang, and C. Shen, “Conditional positional encodings for vision transformers,” in The Eleventh International Conference on Learning Representations, 2022.
- Y. Yan, F. Hu, J. Chen, N. Bhendawade, T. Ye, Y. Gong, N. Duan, D. Cui, B. Chi, and R. Zhang, “Fastseq: Make sequence generation faster,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, 2021, pp. 218–226.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2020.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
- J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020.
- A. Rogozhnikov, “Einops: Clear and reliable tensor manipulations with einstein-like notation,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=oapKSVM2bcj
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
- M. Hodosh, P. Young, and J. Hockenmaier, “Framing image description as a ranking task: Data, models and evaluation metrics,” The Journal of Artificial Intelligence Research, vol. 47, p. 853, 2013.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
- E. Kodak, “Kodak lossless true color image suite (photocd pcd0992),” http://r0k.us/graphics/kodak, 1993.
- N. Asuni and A. Giachetti, “Testimages: a large-scale archive for testing visual devices and basic image processing algorithms.” in STAG, 2014, pp. 63–70.
- CLIC Challenge on Learned Image Compression. [Online]. Available: http://compression.cc/
- F. Bellard, “Bpg image format,” https://bellard.org/bpg, 2015.
- W. Jiang, “Unofficial elic,” https://github.com/JiangWeibeta/ELIC, 2022.
- Y. Xie, K. L. Cheng, and Q. Chen, “Enhanced Invertible Encoding for Learned Image Compression,” in Proceedings of the 29th ACM International Conference on Multimedia. ACM, pp. 162–170.