Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding (2403.05937v1)

Published 9 Mar 2024 in cs.CV and eess.IV

Abstract: Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, this paper introduces a novel wavelet-like transform-based end-to-end image coding framework -- iWaveV3. iWaveV3 incorporates many new features such as affine wavelet-like transform, perceptual-friendly quality metric, and more advanced training and online optimization strategies into our previous wavelet-like transform-based framework iWave++. While preserving the features of supporting lossy and lossless compression simultaneously, iWaveV3 also achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality. As a result, iWaveV3 is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
  2. H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247–1263, 2022.
  3. S. Zhang, C. Zhang, N. Kang, and Z. Li, “ivpf: Numerical invertible volume preserving flow for efficient lossless compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 620–629.
  4. L. Helminger, A. Djelouah, M. Gross, and C. Schroers, “Lossy image compression with normalizing flows,” arXiv preprint arXiv:2008.10486, 2020.
  5. E. Agustsson and L. Theis, “Universally quantized neural compression,” Advances in neural information processing systems, vol. 33, pp. 12 367–12 376, 2020.
  6. Z. Guo, Z. Zhang, R. Feng, and Z. Chen, “Soft then hard: Rethinking the quantization in neural image compression,” in International Conference on Machine Learning.   PMLR, 2021, pp. 3920–3929.
  7. E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool, “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 221–231.
  8. F. Mentzer, G. D. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 913–11 924, 2020.
  9. J. Djelouah and C. Schroers, “Content adaptive optimization for neural image compression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, 2019.
  10. Y. Yang, R. Bamler, and S. Mandt, “Improving inference for neural image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 573–584, 2020.
  11. J. Sneyers and P. Wuille, “Flif: Free lossless image format based on maniac compression,” in 2016 IEEE international conference on image processing (ICIP).   IEEE, 2016, pp. 66–70.
  12. H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  13. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” arXiv preprint arXiv:1703.00395, 2017.
  14. J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” arXiv preprint arXiv:1511.06281, 2015.
  15. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
  16. F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4394–4402.
  17. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in CVPR, 2020, pp. 7939–7948.
  18. K. M. Nakanishi, S.-i. Maeda, T. Miyato, and D. Okanohara, “Neural multi-scale image compression,” in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part VI 14.   Springer, 2019, pp. 718–732.
  19. O. Rippel and L. Bourdev, “Real-time adaptive image compression,” in International Conference on Machine Learning.   PMLR, 2017, pp. 2922–2930.
  20. H. Liu, T. Chen, P. Guo, Q. Shen, X. Cao, Y. Wang, and Z. Ma, “Non-local attention optimized deep image compression,” arXiv preprint arXiv:1904.09757, 2019.
  21. M. Lu, P. Guo, H. Shi, C. Cao, and Z. Ma, “Transformer-based image compression,” arXiv preprint arXiv:2111.06707, 2021.
  22. Y. Bai, X. Yang, X. Liu, J. Jiang, Y. Wang, X. Ji, and W. Gao, “Towards end-to-end image compression and analysis with transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 104–112.
  23. Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022.
  24. A. A. Jeny, M. S. Junayed, and M. B. Islam, “An efficient end-to-end image compression transformer,” in 2022 IEEE International Conference on Image Processing (ICIP).   IEEE, 2022, pp. 1786–1790.
  25. H. Ma, D. Liu, R. Xiong, and F. Wu, “iwave: Cnn-based wavelet-like transform for image compression,” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1667–1679, 2020.
  26. D. Xue, H. Ma, L. Li, D. Liu, and Z. Xiong, “aiWave: Volumetric image compression with 3-D trained affine wavelet-like transform,” arXiv preprint arXiv:2203.05822, 2022.
  27. ——, “iwave3d: End-to-end brain image compression with trainable 3-d wavelet transform,” in 2021 International Conference on Visual Communications and Image Processing (VCIP).   IEEE, 2021, pp. 1–5.
  28. E. Hoogeboom, J. Peters, R. Van Den Berg, and M. Welling, “Integer discrete flows and lossless compression,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  29. R. v. d. Berg, A. A. Gritsenko, M. Dehghani, C. K. Sønderby, and T. Salimans, “Idf++: Analyzing and improving integer discrete flows for lossless compression,” arXiv preprint arXiv:2006.12459, 2020.
  30. J. Ho, E. Lohn, and P. Abbeel, “Compression with flows via local bits-back coding,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  31. S. Zhang, N. Kang, T. Ryder, and Z. Li, “iflow: Numerically invertible flows for efficient lossless compression via a uniform coder,” Advances in Neural Information Processing Systems, vol. 34, pp. 5822–5833, 2021.
  32. Y.-H. Ho, C.-C. Chan, W.-H. Peng, H.-M. Hang, and M. Domański, “Anfic: Image compression using augmented normalizing flows,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 613–626, 2021.
  33. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
  34. E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, and L. V. Gool, “Soft-to-hard vector quantization for end-to-end learning compressible representations,” Advances in neural information processing systems, vol. 30, 2017.
  35. C. Gao, D. Liu, L. Li, and F. Wu, “Towards task-generic image compression: A study of semantics-oriented metrics,” IEEE Transactions on Multimedia, pp. 1–1, 2021.
  36. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  37. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision.   Springer, 2016, pp. 694–711.
  38. S. Santurkar, D. Budden, and N. Shavit, “Generative compression,” in 2018 Picture Coding Symposium (PCS).   IEEE, 2018, pp. 258–262.
  39. Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning.   PMLR, 2019, pp. 675–685.
  40. M. Tschannen, E. Agustsson, and M. Lucic, “Deep generative models for distribution-preserving lossy compression,” Advances in neural information processing systems, vol. 31, 2018.
  41. W. Jiang, W. Wang, S. Li, and S. Liu, “Online meta adaptation for variable-rate learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 498–506.
  42. X. Wang, W. Jiang, W. Wang, S. Liu, B. Kulis, and P. Chin, “Substitutional neural image compression,” arXiv preprint arXiv:2105.07512, 2021.
  43. C. Christopoulos, A. Skodras, and T. Ebrahimi, “The jpeg2000 still image coding system: an overview,” IEEE transactions on consumer electronics, vol. 46, no. 4, pp. 1103–1127, 2000.
  44. A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., “Conditional image generation with pixelcnn decoders,” Advances in neural information processing systems, vol. 29, 2016.
  45. Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
  46. Z. Guangjun, C. Lizhi, and C. Huowang, “A simple 9/7-tap wavelet filter based on lifting scheme,” in Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), vol. 2.   IEEE, 2001, pp. 249–252.
  47. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European conference on computer vision (ECCV) workshops, 2018, pp. 0–0.
  48. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  49. H. Ma, D. Liu, and F. Wu, “Rectified wasserstein generative adversarial networks for perceptual image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  50. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
  51. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  52. T. Tieleman and G. Hinton, “Rmsprop: Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning,” COURSERA Neural Networks Mach. Learn, 2012.
  53. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  54. D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, vol. 31, 2018.
  55. D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 771–14 780.
  56. M. Cao, W. Dai, S. Li, C. Li, J. Zou, Y. Chen, and H. Xiong, “End-to-end optimized image compression with deep gaussian process regression,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  57. E. Kodak, “Kodak lossless true color image suite (photocd pcd0992),” URL http://r0k. us/graphics/kodak, vol. 6, 1993.
  58. N. Asuni and A. Giachetti, “Testimages: a large-scale archive for testing visual devices and basic image processing algorithms.” in STAG, 2014, pp. 63–70.
Citations (3)

Summary

We haven't generated a summary for this paper yet.