Multi-Context Dual Hyper-Prior Neural Image Compression (2309.10799v1)
Abstract: Transform and entropy models are the two core components in deep image compression neural networks. Most existing learning-based image compression methods utilize convolutional-based transform, which lacks the ability to model long-range dependencies, primarily due to the limited receptive field of the convolution operation. To address this limitation, we propose a Transformer-based nonlinear transform. This transform has the remarkable ability to efficiently capture both local and global information from the input image, leading to a more decorrelated latent representation. In addition, we introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation. To further improve the entropy model, we add a global context that leverages distant relationships to predict the current latent more accurately. This global context employs a causal attention mechanism to extract long-range information in a content-dependent manner. Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance.
- G. K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, 1991.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “End-to-end learnt image compression via non-local attention optimization and improved context modeling,” IEEE Transactions on Image Processing, 2021.
- H. Fu, F. Liang, J. Lin, B. Li, M. Akbari, J. Liang, G. Zhang, D. Liu, C. Tu, and J. Han, “Learned image compression with discretized gaussian-laplacian-logistic mixture model and concatenated residual modules,” arXiv preprint arXiv:2107.06463, 2021.
- R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window-based attention for image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE International Conference on Computer Vision, 2021.
- A. Zafari, A. Khoshkhahtinat, P. Mehta, M. S. E. Saadabadi, M. Akyash, and N. M. Nasrabadi, “Frequency disentangled features in neural image compression,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 2815–2819.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
- A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., “Conditional image generation with pixelcnn decoders,” Advances in neural information processing systems, 2016.
- D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, 2018.
- J. Lee, S. Cho, and S.-K. Beack, “Context-adaptive entropy model for end-to-end optimized image compression,” in International Conference on Learning Representations, 2019.
- Y. Qian, Z. Tan, X. Sun, M. Lin, D. Li, Z. Sun, L. Hao, and R. Jin, “Learning accurate entropy model with global reference for image compression,” in International Conference on Learning Representations, 2021.
- V. K. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 9–21, 2001.
- J. Ballé, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 339–353, 2020.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
- J. Zhou, S. Wen, A. Nakagawa, K. Kazui, and Z. Tan, “Multi-scale and context-adaptive entropy model for image compression,” arXiv preprint arXiv:1910.07844, 2019.
- Z. Cui, J. Wang, B. Bai, T. Guo, and Y. Feng, “G-vae: A continuously variable rate deep image compression framework,” arXiv preprint arXiv:2003.02012, vol. 2, no. 3, 2020.
- Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 532–10 541.
- F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4394–4402.
- H. Liu, T. Chen, Q. Shen, and Z. Ma, “Practical stacked non-local attention modules for image compression.” in CVPR Workshops, 2019, p. 0.
- T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “Neural image compression via non-local attention optimization and improved context modeling,” arXiv preprint arXiv:1910.06244, 2019.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning. PMLR, 2021, pp. 10 347–10 357.
- Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia, “End-to-end video instance segmentation with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8741–8750.
- A. Zafari, A. Khoshkhahtinat, P. M. Mehta, N. M. Nasrabadi, B. J. Thompson, D. Da Silva, and M. S. Kirk, “Attention-based generative neural image compression on solar dynamics observatory,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2022, pp. 198–205.
- M. A. Farahani, M. McCormick, R. Gianinny, F. Hudacheck, R. Harik, Z. Liu, and T. Wuest, “Time-series pattern recognition in smart manufacturing systems: A literature review and ontology,” arXiv preprint arXiv:2301.12495, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: Generalizing residual architectures,” arXiv preprint arXiv:1603.08029, 2016.
- S. Mohamadi, G. Doretto, and D. A. Adjeroh, “Fussl: Fuzzy uncertain self supervised learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2799–2808.
- C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843–852.
- D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. Van Der Maaten, “Exploring the limits of weakly supervised pretraining,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 181–196.
- K. Safavigerdini, K. Nouduri, R. Surya, A. Reinhard, Z. Quinlan, F. Bunyak, M. R. Maschmann, and K. Palaniappan, “Predicting mechanical properties of carbon nanotube (CNT) images using multi-layer synthetic finite element model simulations,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 3264–3268.
- M. Tanhaeean, N. Nazari, S. H. Iranmanesh, and M. Abdollahzade, “Analyzing factors contributing to covid-19 mortality in the united states using artificial intelligence techniques,” Risk Analysis, vol. 43, no. 1, pp. 19–43, 2023.
- N. A. Talemi, H. Kashiani, S. R. Malakshan, M. S. E. Saadabadi, N. Najafzadeh, M. Akyash, and N. M. Nasrabadi, “Aaface: Attribute-aware attentional network for face recognition,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 1940–1944.
- X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 124–12 134.
- Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in European conference on computer vision. Springer, 2022, pp. 459–479.
- P. Ansari Bonab, J. Holland, and A. Sargolzaei, “An observer-based control for a networked control of permanent magnet linear motors under a false-data-injection attack,” in 2023 6th IEEE Conference on Dependable and Secure Computing. IEEE, 2023, in press.
- J. Yang, C. Li, P. Zhang, X. Dai, B. Xiao, L. Yuan, and J. Gao, “Focal attention for long-range interactions in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 30 008–30 022, 2021.
- A. Hatamizadeh, H. Yin, J. Kautz, and P. Molchanov, “Global context vision transformers,” arXiv preprint arXiv:2206.09959, 2022.
- M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning, 2021.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in International Conference on Learning Representations, 2017.
- “CLIC · challenge on learned image compression,,” Available at http://compression.cc/tasks/, 2022.
- “Kodak image dataset,,” Available at https://r0k.us/graphics/kodak/, 2022.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, 2015.
- F. Bellard, “BPG image format,” URL https://bellard. org/bpg, vol. 1, no. 2, p. 1, 2015.
- “Versatile Video Coding Reference Software,” Available at https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM, 2022.
- G. K. Wallace, “The jpeg still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
- Atefeh Khoshkhahtinat (9 papers)
- Ali Zafari (22 papers)
- Piyush M. Mehta (20 papers)
- Mohammad Akyash (10 papers)
- Hossein Kashiani (14 papers)
- Nasser M. Nasrabadi (104 papers)