Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding (2403.05937v1)
Abstract: Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, this paper introduces a novel wavelet-like transform-based end-to-end image coding framework -- iWaveV3. iWaveV3 incorporates many new features such as affine wavelet-like transform, perceptual-friendly quality metric, and more advanced training and online optimization strategies into our previous wavelet-like transform-based framework iWave++. While preserving the features of supporting lossy and lossless compression simultaneously, iWaveV3 also achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality. As a result, iWaveV3 is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
- H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247–1263, 2022.
- S. Zhang, C. Zhang, N. Kang, and Z. Li, “ivpf: Numerical invertible volume preserving flow for efficient lossless compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 620–629.
- L. Helminger, A. Djelouah, M. Gross, and C. Schroers, “Lossy image compression with normalizing flows,” arXiv preprint arXiv:2008.10486, 2020.
- E. Agustsson and L. Theis, “Universally quantized neural compression,” Advances in neural information processing systems, vol. 33, pp. 12 367–12 376, 2020.
- Z. Guo, Z. Zhang, R. Feng, and Z. Chen, “Soft then hard: Rethinking the quantization in neural image compression,” in International Conference on Machine Learning. PMLR, 2021, pp. 3920–3929.
- E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool, “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 221–231.
- F. Mentzer, G. D. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 913–11 924, 2020.
- J. Djelouah and C. Schroers, “Content adaptive optimization for neural image compression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, 2019.
- Y. Yang, R. Bamler, and S. Mandt, “Improving inference for neural image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 573–584, 2020.
- J. Sneyers and P. Wuille, “Flif: Free lossless image format based on maniac compression,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 66–70.
- H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” arXiv preprint arXiv:1703.00395, 2017.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” arXiv preprint arXiv:1511.06281, 2015.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
- F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4394–4402.
- Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in CVPR, 2020, pp. 7939–7948.
- K. M. Nakanishi, S.-i. Maeda, T. Miyato, and D. Okanohara, “Neural multi-scale image compression,” in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part VI 14. Springer, 2019, pp. 718–732.
- O. Rippel and L. Bourdev, “Real-time adaptive image compression,” in International Conference on Machine Learning. PMLR, 2017, pp. 2922–2930.
- H. Liu, T. Chen, P. Guo, Q. Shen, X. Cao, Y. Wang, and Z. Ma, “Non-local attention optimized deep image compression,” arXiv preprint arXiv:1904.09757, 2019.
- M. Lu, P. Guo, H. Shi, C. Cao, and Z. Ma, “Transformer-based image compression,” arXiv preprint arXiv:2111.06707, 2021.
- Y. Bai, X. Yang, X. Liu, J. Jiang, Y. Wang, X. Ji, and W. Gao, “Towards end-to-end image compression and analysis with transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 104–112.
- Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022.
- A. A. Jeny, M. S. Junayed, and M. B. Islam, “An efficient end-to-end image compression transformer,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 1786–1790.
- H. Ma, D. Liu, R. Xiong, and F. Wu, “iwave: Cnn-based wavelet-like transform for image compression,” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1667–1679, 2020.
- D. Xue, H. Ma, L. Li, D. Liu, and Z. Xiong, “aiWave: Volumetric image compression with 3-D trained affine wavelet-like transform,” arXiv preprint arXiv:2203.05822, 2022.
- ——, “iwave3d: End-to-end brain image compression with trainable 3-d wavelet transform,” in 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2021, pp. 1–5.
- E. Hoogeboom, J. Peters, R. Van Den Berg, and M. Welling, “Integer discrete flows and lossless compression,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- R. v. d. Berg, A. A. Gritsenko, M. Dehghani, C. K. Sønderby, and T. Salimans, “Idf++: Analyzing and improving integer discrete flows for lossless compression,” arXiv preprint arXiv:2006.12459, 2020.
- J. Ho, E. Lohn, and P. Abbeel, “Compression with flows via local bits-back coding,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- S. Zhang, N. Kang, T. Ryder, and Z. Li, “iflow: Numerically invertible flows for efficient lossless compression via a uniform coder,” Advances in Neural Information Processing Systems, vol. 34, pp. 5822–5833, 2021.
- Y.-H. Ho, C.-C. Chan, W.-H. Peng, H.-M. Hang, and M. Domański, “Anfic: Image compression using augmented normalizing flows,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 613–626, 2021.
- Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
- E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, and L. V. Gool, “Soft-to-hard vector quantization for end-to-end learning compressible representations,” Advances in neural information processing systems, vol. 30, 2017.
- C. Gao, D. Liu, L. Li, and F. Wu, “Towards task-generic image compression: A study of semantics-oriented metrics,” IEEE Transactions on Multimedia, pp. 1–1, 2021.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision. Springer, 2016, pp. 694–711.
- S. Santurkar, D. Budden, and N. Shavit, “Generative compression,” in 2018 Picture Coding Symposium (PCS). IEEE, 2018, pp. 258–262.
- Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685.
- M. Tschannen, E. Agustsson, and M. Lucic, “Deep generative models for distribution-preserving lossy compression,” Advances in neural information processing systems, vol. 31, 2018.
- W. Jiang, W. Wang, S. Li, and S. Liu, “Online meta adaptation for variable-rate learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 498–506.
- X. Wang, W. Jiang, W. Wang, S. Liu, B. Kulis, and P. Chin, “Substitutional neural image compression,” arXiv preprint arXiv:2105.07512, 2021.
- C. Christopoulos, A. Skodras, and T. Ebrahimi, “The jpeg2000 still image coding system: an overview,” IEEE transactions on consumer electronics, vol. 46, no. 4, pp. 1103–1127, 2000.
- A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., “Conditional image generation with pixelcnn decoders,” Advances in neural information processing systems, vol. 29, 2016.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
- Z. Guangjun, C. Lizhi, and C. Huowang, “A simple 9/7-tap wavelet filter based on lifting scheme,” in Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), vol. 2. IEEE, 2001, pp. 249–252.
- X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European conference on computer vision (ECCV) workshops, 2018, pp. 0–0.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- H. Ma, D. Liu, and F. Wu, “Rectified wasserstein generative adversarial networks for perceptual image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- T. Tieleman and G. Hinton, “Rmsprop: Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning,” COURSERA Neural Networks Mach. Learn, 2012.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
- D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, vol. 31, 2018.
- D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 771–14 780.
- M. Cao, W. Dai, S. Li, C. Li, J. Zou, Y. Chen, and H. Xiong, “End-to-end optimized image compression with deep gaussian process regression,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- E. Kodak, “Kodak lossless true color image suite (photocd pcd0992),” URL http://r0k. us/graphics/kodak, vol. 6, 1993.
- N. Asuni and A. Giachetti, “Testimages: a large-scale archive for testing visual devices and basic image processing algorithms.” in STAG, 2014, pp. 63–70.