Exploring Resolution Fields for Scalable Image Compression with Uncertainty Guidance (2306.08941v1)
Abstract: Recently, there are significant advancements in learning-based image compression methods surpassing traditional coding standards. Most of them prioritize achieving the best rate-distortion performance for a particular compression rate, which limits their flexibility and adaptability in various applications with complex and varying constraints. In this work, we explore the potential of resolution fields in scalable image compression and propose the reciprocal pyramid network (RPN) that fulfills the need for more adaptable and versatile compression. Specifically, RPN first builds a compression pyramid and generates the resolution fields at different levels in a top-down manner. The key design lies in the cross-resolution context mining module between adjacent levels, which performs feature enriching and distillation to mine meaningful contextualized information and remove unnecessary redundancy, producing informative resolution fields as residual priors. The scalability is achieved by progressive bitstream reusing and resolution field incorporation varying at different levels. Furthermore, between adjacent compression levels, we explicitly quantify the aleatoric uncertainty from the bottom decoded representations and develop an uncertainty-guided loss to update the upper-level compression parameters, forming a reverse pyramid process that enforces the network to focus on the textured pixels with high variance for more reliable and accurate reconstruction. Combining resolution field exploration and uncertainty guidance in a pyramid manner, RPN can effectively achieve spatial and quality scalable image compression. Experiments show the superiority of RPN against existing classical and deep learning-based scalable codecs. Code will be available at https://github.com/JGIroro/RPNSIC.
- G. Watchcases, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xvii–sixties, 1992.
- A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001.
- F. Bellard, “BPG image format (http://bellard. org/bpg/),” 2017.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimization of nonlinear transform codes for perceptual quality,” in Proceedings of the Picture Coding Symposium, 2016, pp. 1–5.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in Proceedings of the International Conference on Learning Representations, 2018, pp. 1–47.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in Proceedings of the International Conference on Learning Representations, 2017, pp. 1–27.
- J. Lee, S. Cho, and S.-K. Beack, “Context-adaptive entropy model for end-to-end optimized image compression,” in Proceedings of the International Conference on Learning Representations, 2019, pp. 1–20.
- D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- M. Cao, W. Dai, S. Li, C. Li, J. Zou, Y. Chen, and H. Xiong, “End-to-end optimized image compression with deep gaussian process regression,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- M. Li, K. Ma, J. You, D. Zhang, and W. Zuo, “Efficient and effective context-based convolutional entropy modeling for image compression,” IEEE Transactions on Image Processing, vol. 29, pp. 5900–5911, 2020.
- X. Zhu, J. Song, L. Gao, F. Zheng, and H. T. Shen, “Unified multivariate gaussian mixture for efficient neural image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 612–17 621.
- Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7939–7948.
- G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar, “Variable rate image compression with recurrent neural networks,” in Proceedings of the International Conference on Learning Representations, 2016, pp. 1–12.
- C. Cai, L. Chen, X. Zhang, and Z. Gao, “Efficient variable rate image compression with multi-scale decomposition network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 12, pp. 3687–3700, 2018.
- N. Johnston, D. Vincent, D. Minnen, M. Covell, S. Singh, T. Chinen, S. J. Hwang, J. Shor, and G. Toderici, “Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4385–4393.
- Y. Choi, M. El-Khamy, and J. Lee, “Variable rate deep image compression with a conditional autoencoder,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3146–3154.
- D. Minnen and S. Saurabh, “Channel-wise autoregressive entropy models for learned image compression,” in Proceedings of the IEEE International Conference on Image Processing, 2020, pp. 3339–3343.
- J.-H. Lee, S. Jeon, K. P. Choi, Y. Park, and C.-S. Kim, “DPICT: Deep progressive image compression using trit-planes,” in Proceedings of the IEEE conference on International Conference on Computer Vision, 2022, pp. 16 092–16 101.
- H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H. 264/AVC standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103–1120, 2007.
- J. M. Boyce, Y. Ye, J. Chen, and A. K. Ramasubramonian, “Overview of SHVC: Scalable extensions of the high efficiency video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 20–34, 2015.
- Y. Bai, X. Liu, W. Zuo, Y. Wang, and X. Ji, “Learning scalable l∞𝑙l\inftyitalic_l ∞-constrained near-lossless image compression via joint lossy image and residual compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 941–11 950.
- Y. Mei, L. Li, Z. Li, and F. Li, “Learning-based scalable image compression with latent-feature reuse and prediction,” IEEE Transactions on Multimedia, 2021.
- J. Gawlikowski, C. R. N. Tassi, M. Ali, J. Lee, M. Humt, J. Feng, A. Kruspe, R. Triebel, P. Jung, R. Roscher et al., “A survey of uncertainty in deep neural networks,” arXiv preprint arXiv:2107.03342, 2021.
- M. J. Wainwright and E. Simoncelli, “Scale mixtures of gaussians and the statistics of natural images,” Advances in neural information processing systems, vol. 12, 1999.
- Z. Tang, H. Wang, X. Yi, Y. Zhang, S. Kwong, and C.-C. J. Kuo, “Joint graph attention and asymmetric convolutional neural network for deep image compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 421–433, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window-based attention for image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 492–17 501.
- O. Rippel and L. Bourdev, “Real-time adaptive image compression,” in Proceedings of the International Conference on Machine Learning, 2017, pp. 2922–2930.
- E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool, “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE conference on International Conference on Computer Vision, 2019, pp. 221–231.
- F. Mentzer, G. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” in Conference on Neural Information Processing Systems, 2020, pp. 1–12.
- G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
- G. Toderici, D. Vincent, N. Johnston, S. Jin Hwang, D. Minnen, J. Shor, and M. Covell, “Full resolution image compression with recurrent neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 5306–5314.
- Y. Fei, H. Luis, C. Yongmei, and M. Mikhail G., “Slimmable compressive autoencoders for practical neural image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4998–5007.
- Y. Ma, Y. Zhai, and R. Wang, “Deepfgs: Fine-grained scalable coding for learned image compression,” arXiv preprint arXiv:2201.01173, 2022.
- S. Sun, T. He, and Z. Chen, “Semantic structured image coding framework for multiple intelligent applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 9, pp. 3631–3642, 2020.
- H. Choi and I. V. Bajić, “Scalable image coding for humans and machines,” IEEE Transactions on Image Processing, vol. 31, pp. 2739–2754, 2022.
- P. Zhang, S. Wang, M. Wang, J. Li, X. Wang, and S. Kwong, “Rethinking semantic image compression: Scalable representation with cross-modality transfer,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” Advances in Neural Information Processing Systems, vol. 30, 2017.
- G. Franchi, X. Yu, A. Bursuc, E. Aldea, S. Dubuisson, and D. Filliat, “Latent discriminant deterministic uncertainty,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 243–260.
- Y. Zheng, X. Xu, J. Zhou, and J. Lu, “Pointras: Uncertainty-aware multi-resolution learning for point cloud segmentation,” IEEE Transactions on Image Processing, vol. 31, pp. 6002–6016, 2022.
- M. Dusenberry, G. Jerfel, Y. Wen, Y. Ma, J. Snoek, K. Heller, B. Lakshminarayanan, and D. Tran, “Efficient and scalable bayesian neural nets with rank-1 factors,” in Proceedings of the International Conference on Machine Learning, 2020, pp. 2782–2792.
- A. G. Wilson and P. Izmailov, “Bayesian deep learning and a probabilistic perspective of generalization,” Advances in Neural Information Processing Systems, vol. 33, pp. 4697–4708, 2020.
- Q. Ning, W. Dong, X. Li, J. Wu, and G. Shi, “Uncertainty-driven loss for single image super-resolution,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 398–16 409, 2021.
- J. Rissanen and G. Langdon, “Universal modeling and coding,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 12–23, 1981.
- Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “Global context networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in Proceedings of the International Conference on Learning Representations, 2017, pp. 1–13.
- X. Gao, Y. Zhao, L. Robert D. Mullins, and C.-Z. Xu, “Dynamic channel pruning: Feature boosting and suppression,” in Proceedings of the International Conference on Learning Representations, 2019, pp. 1–14.
- L. Wang, X. Dong, Y. Wang, X. Ying, Z. Lin, W. An, and Y. Guo, “Exploring sparsity in image super-resolution for efficient inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4917–4926.
- M. Figueiredo, “Adaptive sparseness using jeffreys prior,” Advances in Neural Information Processing Systems, vol. 14, 2001.
- G. Toderici, W. Shi, R. Timofte, L. Theis, J. Ballé, E. Agustsson, N. Johnston, and F. Mentzer, “Workshop and challenge on learned image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- R. Franzen, “Kodak lossless true color image suite,” source: http://r0k.us/graphics/kodak, vol. 4, no. 2, 1999.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proceedings of the International Conference on Learning Representations, 2015, pp. 1–15.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: a system for large-scale machine learning,” in Proceedings of the Symposium on Operating Systems Design and Implementation, 2016, pp. 265–283.
- J. Ballé, S. J. Hwang, N. Johnston, and D. Minnen, “Tensorflow compression,” 2018.
- SHM-12.4 software package. 2017. [Online]. Available: https://hevc.hhi.fraunhofer.de/trac/shvc/browser/SHVCSoftware/tags/SHM-12.4
- JSVM-9.19 software package. 2007. [Online]. Available: https://www.hhi.fraunhofer.de/departments/video-coding-analytics/research-groups/image-video-coding/research-topics/svc-extension-of-h264avc/jsvm-reference-software.html
- FFmpeg Developers. Ffmpeg tool. Accessed: Jan. 17, 2022. [Online]. Available: https://ffmpeg.org/
- G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” ITU SG16 Doc. VCEG-M33, 2001.