Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

S2LIC: Learned Image Compression with the SwinV2 Block, Adaptive Channel-wise and Global-inter Attention Context (2403.14471v2)

Published 21 Mar 2024 in eess.IV

Abstract: Recently, deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. It is crucial to design an effective and efficient entropy model to estimate the probability distribution of the latent representation. However, the majority of entropy models primarily focus on one-dimensional correlation processing between channel and spatial information. In this paper, we propose an Adaptive Channel-wise and Global-inter attention Context (ACGC) entropy model, which can efficiently achieve dual feature aggregation in both inter-slice and intraslice contexts. Specifically, we divide the latent representation into different slices and then apply the ACGC model in a parallel checkerboard context to achieve faster decoding speed and higher rate-distortion performance. In order to capture redundant global features across different slices, we utilize deformable attention in adaptive global-inter attention to dynamically refine the attention weights based on the actual spatial relationships and context. Furthermore, in the main transformation structure, we propose a high-performance S2LIC model. We introduce the residual SwinV2 Transformer model to capture global feature information and utilize a dense block network as the feature enhancement module to improve the nonlinear representation of the image within the transformation structure. Experimental results demonstrate that our method achieves faster encoding and decoding speeds and outperforms VTM-17.1 and some recent learned image compression methods in both PSNR and MS-SSIM metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. G. K. Wallace, “The jpeg still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
  2. D. S. Taubman, M. W. Marcellin, and M. Rabbani, “Jpeg2000: Image compression fundamentals, standards and practice,” Journal of Electronic Imaging, vol. 11, no. 2, pp. 286–287, 2002.
  3. F. Bellard, “Bpg image format (2017),” [Online]., 2016. [Online]. Available: http://bellard.org/bpg
  4. B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
  5. N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974.
  6. D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the h. 264/avc video compression standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 620–636, 2003.
  7. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020.
  8. M. Lu, P. Guo, H. Shi, C. Cao, and Z. Ma, “Transformer-based image compression,” in 2022 Data Compression Conference (DCC), 2022, pp. 469–469.
  9. J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformer-cnn architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 388–14 397.
  10. Y. Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, “Entroformer: A transformer-based entropy model for learned image compression,” in International Conference on Learning Representations, May 2022.
  11. D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” 2020.
  12. D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5718–5727.
  13. H. Fu, F. Liang, J. Lin, B. Li, M. Akbari, J. Liang, G. Zhang, D. Liu, C. Tu, and J. Han, “Learned image compression with gaussian-laplacian-logistic mixture model and concatenated residual modules,” IEEE Transactions on Image Processing, vol. 32, pp. 2063–2076, 2023.
  14. J. Ballé, D. Minnen, S. Singh, S. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” Feb 2018.
  15. Y. Bai, X. Yang, X. Liu, J. Jiang, Y. Wang, X. Ji, and W. Gao, “Towards end-to-end image compression and analysis with transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 104–112.
  16. Z. Tang, H. Wang, X. Yi, Y. Zhang, S. Kwong, and C.-C. J. Kuo, “Joint graph attention and asymmetric convolutional neural network for deep image compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 421–433, 2023.
  17. W. Jiang, J. Yang, Y. Zhai, P. Ning, F. Gao, and R. Wang, “Mlic: Multi-reference entropy model for learned image compression,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7618–7627.
  18. D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 14 771–14 780.
  19. J. Ballé, V. Laparra, and E. Simoncelli, “End-to-end optimized image compression,” International Conference on Learning Representations,International Conference on Learning Representations, Nov 2016.
  20. Z. Guo, Z. Zhang, R. Feng, and Z. Chen, “Causal contextual prediction for learned image compression,” IEEE Transactions on Circuits and Systems for Video Technology, p. 2329–2341, Apr 2022.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  22. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
  23. B. Li, J. Liang, H. Fu, and J. Han, “Roi-based deep image compression with swin transformers,” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023.
  24. Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong et al., “Swin transformer v2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 009–12 019.
  25. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
  26. Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision transformer with deformable attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4794–4803.
  27. Z. Chen, Y. Zhang, J. Gu, L. Kong, X. Yang, and F. Yu, “Dual aggregation transformer for image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12 312–12 321.
  28. Kodak. [Online]. Available: http://r0k.us/graphics/kodak/
  29. Tecnick. [Online]. Available: https://bellard.org/bpg/
  30. “Clic. workshop and challenge on learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  31. Y. Xie, K. L. Cheng, and Q. Chen, “Enhanced invertible encoding for learned image compression,” in Proceedings of the 29th ACM international conference on multimedia, 2021, pp. 162–170.
  32. R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window-based attention for image compression,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17 471–17 480.
  33. H. Fu, F. Liang, J. Liang, B. Li, G. Zhang, and J. Han, “Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4309–4321, 2023.
  34. J. Liu, G. Lu, Z. Hu, and D. Xu, “A unified end-to-end framework for efficient deep image compression,” arXiv preprint arXiv:2002.03370, 2020.
  35. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun 2009. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2009.5206848
  36. J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020.
  37. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv: Learning,arXiv: Learning, Dec 2014.
  38. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  39. J. Kim, B. Heo, and J. Lee, “Joint global and local hierarchical priors for learned image compression,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5982–5991.
  40. G. Bjontegaard, “Calculation of average psnr differences between rd-curves,” Jan 2001.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yongqiang Wang (92 papers)
  2. Feng Liang (61 papers)
  3. Haisheng Fu (15 papers)
  4. Qi Cao (57 papers)
  5. Shang Wang (25 papers)
  6. Zhenjiao Chen (1 paper)