Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model (2402.17487v1)
Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold.
- G. Wallace, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
- A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001.
- G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
- F. Bellard, “BPG image format,” 2015, accessed: 2021-11-05. URL https://bellard.org/bpg.
- B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the Versatile Video Coding (VVC) Standard and its Applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
- V. K. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 9–21, 2001.
- J. Rissanen and G. G. Langdon, “Arithmetic coding,” IBM Journal of Research and Development, vol. 23, no. 2, pp. 149–162, 1979.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end Optimized Image Compression,” ArXiv, vol. abs/1611.01704, 2017.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” ArXiv, vol. abs/1802.01436, 2018.
- D. Minnen, J. Ballé, and G. Toderici, “Joint Autoregressive and Hierarchical Priors for Learned Image Compression,” ArXiv, vol. abs/1809.02736, 2018.
- Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7939–7948.
- A. B. Koyuncu, K. Cui, A. Boev, and E. Steinbach, “Parallelized context modeling for faster image coding,” in 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2021, pp. 1–5.
- Z. Guo, Z. Zhang, R. Feng, and Z. Chen, “Causal contextual prediction for learned image compression,” IEEE Transactions on Circuits and Systems for Video Technology, 2021.
- Y. Qian, X. Sun, M. Lin, Z. Tan, and R. Jin, “Entroformer: A transformer-based entropy model for learned image compression,” in International Conference on Learning Representations, 2021.
- A. B. Koyuncu, H. Gao, A. Boev, G. Gaikov, E. Alshina, and E. Steinbach, “Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression,” in European Conference on Computer Vision. Springer, 2022, pp. 447–463.
- D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5718–5727.
- A. B. Koyuncu, P. Jia, A. Boev, E. Alshina, and E. Steinbach, “Efficient contextformer: Spatio-channel window attention for fast context modeling in learned image compression,” 2023.
- J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformer-cnn architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 388–14 397.
- J. Ballé, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 339–353, 2020.
- Y. Choi, M. El-Khamy, and J. Lee, “Variable rate deep image compression with a conditional autoencoder,” 2019.
- M. Song, J. Choi, and B. Han, “Variable-rate deep image compression through spatially-adaptive feature transform,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2380–2389.
- F. Brand, K. Fischer, and A. Kaup, “Rate-distortion optimized learning-based image compression using an adaptive hierachical autoencoder with conditional hyperprior,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1885–1889.
- Z. Cui, J. Wang, B. Bai, T. Guo, and Y. Feng, “G-vae: A continuously variable rate deep image compression framework,” ArXiv, vol. abs/2003.02012, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:211988449
- Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 532–10 541.
- J. Ascenso, E. Alshina, and T. Ebrahimi, “The jpeg ai standard: Providing efficient human and machine visual data consumption,” IEEE MultiMedia, vol. 30, no. 1, pp. 100–111, 2023.
- P. Jia, A. B. Koyuncu, G. Gaikov, A. Karabutov, E. Alshina, and A. Kaup, “Learning-based conditional image coder using color separation,” in 2022 Picture Coding Symposium (PCS), 2022, pp. 49–53.
- J. Søgaard, L. Krasula, M. Shahid, D. Temel, K. Brunnström, and M. Razaak, “Applicability of existing objective metrics of perceptual quality for adaptive video streaming,” in Image Quality and System Performance, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:26253431
- Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, 2003, pp. 1398–1402 Vol.2.
- S. Tomar, “Converting video formats with ffmpeg,” Linux Journal, vol. 2006, no. 146, p. 10, 2006.
- C. Subgroup, “JPEG AI – Learning-based Image coding Common and Training Test Conditions,” 2023, iSO/IEC JTC 1/SC29/WG1 N100600, JPEG AI (ISO/IEC 6048).
- G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001.
- H. Sheikh and A. Bovik, “Image information and visual quality,” vol. 3, 01 2004, pp. iii–709.
- L. Zhang, L. Zhang, X. Mou, and D. Zhang, “Fsim: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
- V. Laparra, J. Ballé, A. Berardino, and E. Simoncelli, “Perceptual image quality assessment using a normalized laplacian pyramid,” Electronic Imaging, vol. 2016, pp. 1–6, 02 2016.
- Z. Wang and Q. Li, “Information content weighting for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1185–1198, 2011.
- Z. Li, A. Aaronand, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward A Practical Perceptual Video Quality Metric,” 2016, https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652.
- N. Ponomarenko, F. Silvestri, K. Egiazarian, M. Carli, J. Astola, and V. Lukin, “On between-coefficient contrast masking of dct basis functions,” Proc of the 3rd Int Workshop on Video Processing and Quality Metrics for Consumer Electronics, 01 2007.