Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frequency-Aware Transformer for Learned Image Compression (2310.16387v4)

Published 25 Oct 2023 in eess.IV and cs.CV

Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Testimages: a large-scale archive for testing visual devices and basic image processing algorithms. In STAG, pp.  63–70, 2014.
  2. Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations, 2018.
  3. Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029, 2020.
  4. Gisle Bjontegaard. Calculation of average psnr differences between rd-curves. In VCEG-M33, 2001.
  5. Curvelets: A surprisingly effective nonadaptive representation for objects with edges. Vanderbilt Univ. Press, 2000.
  6. End-to-end object detection with transformers. In European conference on computer vision, pp.  213–229. Springer, 2020.
  7. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7939–7948, 2020.
  8. CLIC. Workshop and challenge on learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  9. Imagenet: A large-scale hierarchical image database. pp.  248–255, 2009.
  10. Motion-aware contrastive video representation learning via foreground-background merging. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9716–9726, 2022a.
  11. Dual contrastive learning for spatio-temporal representation. In Proceedings of the 30th ACM international conference on multimedia, pp.  5649–5658, 2022b.
  12. M.N. Do and M. Vetterli. The contourlet transform: an efficient directional multiresolution image representation. IEEE Transactions on Image Processing, 14(12):2091–2106, 2005.
  13. Beamlets and multiscale image analysis. In Multiscale and Multiresolution Methods, pp.  149–196, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  15. Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  16. Neural image compression via attentional multi-scale back projection and frequency decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  14677–14686, 2021.
  17. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  18. Eastman Kodak. Kodak lossless true color image suite (photocd pcd0992). 1993.
  19. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In European Conference on Computer Vision, 2022.
  20. Pose-oriented transformer with uncertainty-guided refinement for 2d-to-3d human pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  1296–1304, 2023.
  21. A unified end-to-end framework for efficient deep image compression. arXiv preprint arXiv:2002.03370, 2020.
  22. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14388–14397, 2023.
  23. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  24. Transformer-based image compression. In 2022 Data Compression Conference (DCC), pp.  469–469. IEEE, 2022.
  25. iwave: Cnn-based wavelet-like transform for image compression. IEEE Transactions on Multimedia, 22(7):1667–1679, 2019.
  26. End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1247–1263, 2020.
  27. Stephane G Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE transactions on pattern analysis and machine intelligence, 11(7):674–693, 1989.
  28. Vct: A video compression transformer. arXiv preprint arXiv:2206.07307, 2022.
  29. Channel-wise autoregressive entropy models for learned image compression. In IEEE International Conference on Image Processing (ICIP), pp.  3339–3343. IEEE, 2020.
  30. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018.
  31. Fast vision transformers with hilo attention. Advances in Neural Information Processing Systems, 35:14541–14554, 2022.
  32. How do vision transformers work? In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=IDwN6xjHnK8.
  33. Scattering vision transformer: Spectral mixing matters. Advances in Neural Information Processing Systems, 36, 2024.
  34. Entroformer: A transformer-based entropy model for learned image compression. In International Conference on Learning Representations, 2022.
  35. Global filter networks for image classification. Advances in neural information processing systems, 34:980–993, 2021.
  36. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  37. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  162–170, 2021.
  38. Frequency disentangled features in neural image compression. In 2023 IEEE International Conference on Image Processing (ICIP), pp.  2815–2819. IEEE, 2023.
  39. Transformer-based transform coding. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=IDwN6xjHnK8.
  40. The devil is in the details: Window-based attention for image compression. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Han Li (182 papers)
  2. Shaohui Li (5 papers)
  3. Wenrui Dai (35 papers)
  4. Chenglin Li (42 papers)
  5. Junni Zou (31 papers)
  6. Hongkai Xiong (75 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.