Saliency-aware End-to-end Learned Variable-Bitrate 360-degree Image Compression (2402.08862v1)
Abstract: Effective compression of 360$\circ$ images, also referred to as omnidirectional images (ODIs), is of high interest for various virtual reality (VR) and related applications. 2D image compression methods ignore the equator-biased nature of ODIs and fail to address oversampling near the poles, leading to inefficient compression when applied to ODI. We present a new learned saliency-aware 360$\circ$ image compression architecture that prioritizes bit allocation to more significant regions, considering the unique properties of ODIs. By assigning fewer bits to less important regions, significant data size reduction can be achieved while maintaining high visual quality in the significant regions. To the best of our knowledge, this is the first study that proposes an end-to-end variable-rate model to compress 360$\circ$ images leveraging saliency information. The results show significant bit-rate savings over the state-of-the-art learned and traditional ODI compression methods at similar perceptual visual quality.
- “State-of-the-art in 360° video/image processing: Perception, assessment and compression,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 1, pp. 5–26, 2020.
- “How do people explore virtual environments?,” IEEE Trans. on Visualization and Computer Graphics, 2017.
- B. W. Tatler, “The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions,” Journal of Vision, vol. 7, no. 14, pp. 4–4, 11 2007.
- “Look around you: Saliency maps for omnidirectional images in vr applications,” in 2017 Ninth Int. Conf. on Quality of Multimedia Experience (QoMEX), 2017, pp. 1–6.
- University of Nantes, “Salient360!: Visual attention modeling for 360 images grand challenge,” in Proc. IEEE Int. Conf. Multimedia Expo, 2017, pp. 35–42.
- M. Startsev and M. Dorr, “360-aware saliency estimation with conventional image saliency predictors,” Signal Processing: Image Communication, vol. 69, pp. 43–52, 2018.
- “A feature integrated saliency estimation model for omnidirectional immersive images,” Electronics, vol. 8, no. 12, 2019.
- P. Lebreton and A. Raake, “Gbvs360, bms360, prosal: Extending existing saliency prediction models from 2d to omnidirectional images,” Signal Processing: Image Communication, vol. 69, pp. 69–78, 2018.
- “Salnet360: Saliency maps for omni-directional images with cnn,” Signal Processing: Image Communication, vol. 69, pp. 26–34, 2018.
- “A multi-fov viewport-based visual saliency model using adaptive weighting losses for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images,” IEEE Trans. on Multimedia, vol. 23, pp. 1811–1826, 2021.
- “Salgan360: Visual saliency prediction on 360 degree images with generative adversarial networks,” in 2018 IEEE Int. Conf. on Multimedia & Expo Workshops (ICMEW), 2018, pp. 01–04.
- “Mrgan360: Multi-stage recurrent generative adversarial network for 360 degree image saliency prediction,” ArXiv, vol. abs/2303.08525, 2023.
- “Atsal: An attention based architecture for saliency prediction in 360°videos,” in Int. Conf. on Pattern Recognition (ICPR). Springer, 2021, pp. 305–320.
- H. Hadizadeh and I. V. Bajić, “Saliency-aware video compression,” IEEE Trans. on Image Processing, vol. 23, no. 1, pp. 19–33, 2014.
- “Quality-oriented perceptual hevc based on the spatiotemporal saliency detection model,” Entropy, vol. 21, no. 2, 2019.
- “Saliency-driven rate-distortion optimization for 360-degree image coding,” Multimedia Tools and Applications, vol. 80, pp. 1–21, 03 2021.
- “Saliency-driven omnidirectional imaging adaptive coding: Modeling and assessment,” in 2017 IEEE 19th Int. Workshop on Multimedia Signal Processing (MMSP), 2017, pp. 1–6.
- “End-to-end optimized 360° image compression,” IEEE Trans. on Image Processing, vol. 31, pp. 6267–6281, 2022.
- “Oslo: On-the-sphere learning for omnidirectional images and its application to 360-degree image compression,” IEEE Trans. on Image Processing, vol. 31, pp. 5813–5827, 2022.
- “Sigvic: Spatial importance guided variable-rate image compression,” in ICASSP 2023 - 2023 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
- “Variable rate roi image compression optimized for visual quality,” in 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1936–1940.
- “Saliency driven perceptual image compression,” in 2021 IEEE Winter Conf. on Applications of Computer Vision (WACV), 2021, pp. 227–236.
- “Simple vs complex temporal recurrences for video saliency prediction,” in British Machine Vision Conf., 2019.
- K. Zhang and Z. Chen, “Video saliency prediction based on spatial-temporal two-stream network,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 29, no. 12, pp. 3544–3557, 2019.
- “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5718–5727.
- “Qvrf: A quantization-error-aware variable rate framework for learned image compression,” 2023 IEEE Int. Conf. on Image Processing (ICIP), pp. 1310–1314, 2023.
- “Weighted-to-spherically-uniform quality evaluation for omnidirectional video,” IEEE Signal Processing Letters, vol. 24, pp. 1408–1412, 2017.
- “Weighted-to-spherically-uniform ssim objective quality evaluation for panoramic video,” in 2018 14th IEEE Int. Conf. on Signal Processing (ICSP), 2018, pp. 54–57.
- “Salgan: Visual saliency prediction with generative adversarial networks,” in CVPR 2017 Scene Understanding Workshop (SUNw), Honolulu, Hawaii, USA, 2017.
- “Salbinet360: Saliency prediction on 360° images with local-global bifurcated deep network,” in 2020 IEEE Conf. on Virtual Reality and 3D User Interfaces (VR), 2020, pp. 92–100.
- “Variational image compression with a scale hyperprior,” in Int. Conf. on Learning Representations (ICLR), 2018.
- “Joint autoregressive and hierarchical priors for learned image compression,” in NeurIPS, 2018.
- “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7936–7945.
- “Versatile video coding reference software version 12.1 (vtm-12.1),” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tags/VTM-12.1, 2021.
- G. Bjontegaard, “Calculation of average psnr differences between rd-curves,” Tech. Rep., VCEG-M33, Austin, TX, USA, April 2001.