Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ECSIC: Epipolar Cross Attention for Stereo Image Compression (2307.10284v2)

Published 18 Jul 2023 in eess.IV, cs.CV, and cs.LG

Abstract: In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. AVIF image format. https://aomediacodec.github.io/av1-avif, 2022. Accessed: 2023-03.
  2. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 221–231, 2019.
  3. End-to-end optimized image compression. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
  4. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  5. Instereo2k: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences, 63(11):1–11, 2020.
  6. Fabrice Bellard. BPG Image format. https://bellard.org/bpg. Accessed: 2021-09-24.
  7. Gisle Bjontegaard. Calculation of average PSNR differences between RD-curves. VCEG-M33, 2001.
  8. Overview of the Versatile Video Coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2021.
  9. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7936–7945, 2020.
  10. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  11. Deep homography for efficient stereo image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1492–1501, June 2021.
  12. Neural image compression via attentional multi-scale back projection and frequency decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14677–14686, October 2021.
  13. Perceptual learned image compression with continuous rate adaptation. In 4th Challenge on Learned Image Compression, Jun 2021.
  14. Neural image compression with a diffusion-based decoder, 2023.
  15. Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2021.
  16. Soft then hard: Rethinking the quantization in neural image compression. In International Conference on Machine Learning, pages 3920–3929. PMLR, 2021.
  17. ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5718–5727, June 2022.
  18. PO-ELIC: Perception-oriented efficient learned image coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1764–1769, June 2022.
  19. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14771–14780, June 2021.
  20. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  21. L3c-stereo: Lossless compression for stereo images. arXiv preprint arXiv:2108.09422, 2021.
  22. Joint global and local hierarchical priors for learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5992–6001, June 2022.
  23. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 447–463, Cham, 2022. Springer Nature Switzerland.
  24. DSIC: Deep stereo image compression. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3136–3145, 2019.
  25. Conditional probability models for deep image compression. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4394–4402, 2018.
  26. High-fidelity generative image compression. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 11913–11924. Curran Associates, Inc., 2020.
  27. Efficient compression of multi-view video exploiting inter-view dependencies based on h. 264/mpeg4-avc. In 2006 IEEE International Conference on Multimedia and Expo, pages 1717–1720. IEEE, 2006.
  28. Joint autoregressive and hierarchical priors for learned image compression. Advances in Neural Information Processing Systems, 31:10771–10780, 2018.
  29. Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE International Conference on Image Processing (ICIP), pages 3339–3343, 2020.
  30. Neural distributed image compression with cross-attention feature alignment. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2498–2507, 2023.
  31. Entroformer: A transformer-based entropy model for learned image compression. In International Conference on Learning Representations, 2022.
  32. Learning accurate entropy model with global reference for image compression, 2020.
  33. Self-attention with relative position representations. In Proceedings of NAACL-HLT, pages 464–468, 2018.
  34. The JPEG 2000 still image compression standard. IEEE Signal Processing Magazine, 18(5):36–58, 2001.
  35. Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1649–1668, 2012.
  36. Lossy compression with gaussian diffusion, 2022.
  37. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations, 2017.
  38. Variable rate image compression with recurrent neural networks. CoRR, abs/1511.06085, 2016.
  39. Deep generative models for distribution-preserving lossy compression. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  40. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  41. G.K. Wallace. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics, 38(1):xviii–xxxiv, 1992.
  42. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, volume 2, pages 1398–1402 Vol.2, 2003.
  43. SASIC: Stereo image compression with latent shifts and stereo attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 661–670, 2022.
  44. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 162–170, New York, NY, USA, 2021. Association for Computing Machinery.
  45. Improving inference for neural image compression. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 573–584. Curran Associates, Inc., 2020.
  46. LDMIC: Learning-based distributed multi-view image coding. arXiv preprint arXiv:2301.09799, 2023.
  47. Transformer-based transform coding. In International Conference on Learning Representations, 2022.
  48. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17492–17501, June 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.