Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Context Dual Hyper-Prior Neural Image Compression (2309.10799v1)

Published 19 Sep 2023 in eess.IV, cs.CV, and cs.LG

Abstract: Transform and entropy models are the two core components in deep image compression neural networks. Most existing learning-based image compression methods utilize convolutional-based transform, which lacks the ability to model long-range dependencies, primarily due to the limited receptive field of the convolution operation. To address this limitation, we propose a Transformer-based nonlinear transform. This transform has the remarkable ability to efficiently capture both local and global information from the input image, leading to a more decorrelated latent representation. In addition, we introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation. To further improve the entropy model, we add a global context that leverages distant relationships to predict the current latent more accurately. This global context employs a causal attention mechanism to extract long-range information in a content-dependent manner. Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. G. K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, 1991.
  2. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  3. T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “End-to-end learnt image compression via non-local attention optimization and improved context modeling,” IEEE Transactions on Image Processing, 2021.
  4. H. Fu, F. Liang, J. Lin, B. Li, M. Akbari, J. Liang, G. Zhang, D. Liu, C. Tu, and J. Han, “Learned image compression with discretized gaussian-laplacian-logistic mixture model and concatenated residual modules,” arXiv preprint arXiv:2107.06463, 2021.
  5. R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window-based attention for image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  6. Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022.
  7. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE International Conference on Computer Vision, 2021.
  8. A. Zafari, A. Khoshkhahtinat, P. Mehta, M. S. E. Saadabadi, M. Akyash, and N. M. Nasrabadi, “Frequency disentangled features in neural image compression,” in 2023 IEEE International Conference on Image Processing (ICIP).   IEEE, 2023, pp. 2815–2819.
  9. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  10. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
  11. A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., “Conditional image generation with pixelcnn decoders,” Advances in neural information processing systems, 2016.
  12. D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, 2018.
  13. J. Lee, S. Cho, and S.-K. Beack, “Context-adaptive entropy model for end-to-end optimized image compression,” in International Conference on Learning Representations, 2019.
  14. Y. Qian, Z. Tan, X. Sun, M. Lin, D. Li, Z. Sun, L. Hao, and R. Jin, “Learning accurate entropy model with global reference for image compression,” in International Conference on Learning Representations, 2021.
  15. V. K. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 9–21, 2001.
  16. J. Ballé, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 339–353, 2020.
  17. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
  18. J. Zhou, S. Wen, A. Nakagawa, K. Kazui, and Z. Tan, “Multi-scale and context-adaptive entropy model for image compression,” arXiv preprint arXiv:1910.07844, 2019.
  19. Z. Cui, J. Wang, B. Bai, T. Guo, and Y. Feng, “G-vae: A continuously variable rate deep image compression framework,” arXiv preprint arXiv:2003.02012, vol. 2, no. 3, 2020.
  20. Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 532–10 541.
  21. F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4394–4402.
  22. H. Liu, T. Chen, Q. Shen, and Z. Ma, “Practical stacked non-local attention modules for image compression.” in CVPR Workshops, 2019, p. 0.
  23. T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “Neural image compression via non-local attention optimization and improved context modeling,” arXiv preprint arXiv:1910.06244, 2019.
  24. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  25. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision.   Springer, 2020, pp. 213–229.
  26. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning.   PMLR, 2021, pp. 10 347–10 357.
  27. Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia, “End-to-end video instance segmentation with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8741–8750.
  28. A. Zafari, A. Khoshkhahtinat, P. M. Mehta, N. M. Nasrabadi, B. J. Thompson, D. Da Silva, and M. S. Kirk, “Attention-based generative neural image compression on solar dynamics observatory,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA).   IEEE, 2022, pp. 198–205.
  29. M. A. Farahani, M. McCormick, R. Gianinny, F. Hudacheck, R. Harik, Z. Liu, and T. Wuest, “Time-series pattern recognition in smart manufacturing systems: A literature review and ontology,” arXiv preprint arXiv:2301.12495, 2023.
  30. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  31. S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: Generalizing residual architectures,” arXiv preprint arXiv:1603.08029, 2016.
  32. S. Mohamadi, G. Doretto, and D. A. Adjeroh, “Fussl: Fuzzy uncertain self supervised learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2799–2808.
  33. C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843–852.
  34. D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. Van Der Maaten, “Exploring the limits of weakly supervised pretraining,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 181–196.
  35. K. Safavigerdini, K. Nouduri, R. Surya, A. Reinhard, Z. Quinlan, F. Bunyak, M. R. Maschmann, and K. Palaniappan, “Predicting mechanical properties of carbon nanotube (CNT) images using multi-layer synthetic finite element model simulations,” in 2023 IEEE International Conference on Image Processing (ICIP).   IEEE, 2023, pp. 3264–3268.
  36. M. Tanhaeean, N. Nazari, S. H. Iranmanesh, and M. Abdollahzade, “Analyzing factors contributing to covid-19 mortality in the united states using artificial intelligence techniques,” Risk Analysis, vol. 43, no. 1, pp. 19–43, 2023.
  37. N. A. Talemi, H. Kashiani, S. R. Malakshan, M. S. E. Saadabadi, N. Najafzadeh, M. Akyash, and N. M. Nasrabadi, “Aaface: Attribute-aware attentional network for face recognition,” in 2023 IEEE International Conference on Image Processing (ICIP).   IEEE, 2023, pp. 1940–1944.
  38. X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 124–12 134.
  39. Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in European conference on computer vision.   Springer, 2022, pp. 459–479.
  40. P. Ansari Bonab, J. Holland, and A. Sargolzaei, “An observer-based control for a networked control of permanent magnet linear motors under a false-data-injection attack,” in 2023 6th IEEE Conference on Dependable and Secure Computing.   IEEE, 2023, in press.
  41. J. Yang, C. Li, P. Zhang, X. Dai, B. Xiao, L. Yuan, and J. Gao, “Focal attention for long-range interactions in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 30 008–30 022, 2021.
  42. A. Hatamizadeh, H. Yin, J. Kautz, and P. Molchanov, “Global context vision transformers,” arXiv preprint arXiv:2206.09959, 2022.
  43. M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning, 2021.
  44. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
  45. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018.
  46. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in International Conference on Learning Representations, 2017.
  47. “CLIC · challenge on learned image compression,,” Available at http://compression.cc/tasks/, 2022.
  48. “Kodak image dataset,,” Available at https://r0k.us/graphics/kodak/, 2022.
  49. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, 2015.
  50. F. Bellard, “BPG image format,” URL https://bellard. org/bpg, vol. 1, no. 2, p. 1, 2015.
  51. “Versatile Video Coding Reference Software,” Available at https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM, 2022.
  52. G. K. Wallace, “The jpeg still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Atefeh Khoshkhahtinat (9 papers)
  2. Ali Zafari (22 papers)
  3. Piyush M. Mehta (20 papers)
  4. Mohammad Akyash (10 papers)
  5. Hossein Kashiani (14 papers)
  6. Nasser M. Nasrabadi (104 papers)
Citations (6)