FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information (2312.16963v2)
Abstract: Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC methods in recent years. However, these approaches neglect the unique characteristics of stereo-imaging tasks and incur high decoding latency. To address this limitation, we propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder. FFCA adopts a coarse-to-fine cascaded alignment approach. In the initial stage, FFCA utilizes a feature domain patch-matching module based on stereo priors. This module reduces redundancy in the search space of trivial matching methods and further mitigates the introduction of noise. In the subsequent stage, we utilize an hourglass-based sparse stereo refinement network to further align inter-image features with a reduced computational cost. Furthermore, we have devised a lightweight yet high-performance feature fusion network, called a Fast Feature Fusion network (FFF), to decode the aligned features. Experimental results on InStereo2K, KITTI, and Cityscapes datasets demonstrate the significant superiority of our approach over traditional and learning-based SIC methods. In particular, our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.
- Deep image compression using decoder side information. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pp. 699–714. Springer, 2020.
- End-to-end optimized image compression. arXiv preprint arXiv:1611.01704, 2016.
- Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436, 2018.
- Instereo2k: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences, 63:1–11, 2020.
- Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029, 2020.
- Fabrice Bellard. Bpg image format. https://bellard.org/bpg/, 2014.
- Gisle Bjontegaard. Calculation of average psnr differences between rd-curves. ITU SG16 Doc. VCEG-M33, 2001.
- Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5410–5418, 2018.
- Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7939–7948, 2020.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223, 2016.
- Deep homography for efficient stereo image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1492–1501, 2021.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2495–2504, 2020.
- Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14771–14780, 2021.
- Learned distributed image compression with multi-scale patch matching in feature domain. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 4322–4329, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Deep stereo image compression via bi-directional coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19669–19678, 2022.
- Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3214–3223, 2018.
- Dsic: Deep stereo image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3136–3145, 2019.
- 3-d integration of robot vision and laser data with semiautomatic calibration in augmented reality stereoscopic visual interface. IEEE Transactions on Industrial Informatics, 8(1):69–77, 2012. doi: 10.1109/TII.2011.2174062.
- Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4394–4402, 2018.
- Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3061–3070, 2015.
- Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018.
- Neural distributed image compression using common information. In 2022 Data Compression Conference (DCC), pp. 182–191. IEEE, 2022.
- Using real-time stereo vision for mobile robot navigation. autonomous robots, 8:161–171, 2000.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13906–13915, 2021.
- Noiseless coding of correlated information sources. IEEE Transactions on information Theory, 19(4):471–480, 1973.
- Jpeg2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2):286–287, 2002.
- Overview of the multiview and 3d extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(1):35–49, 2015.
- Gregory K Wallace. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 38(1):xviii–xxxiv, 1992.
- Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pp. 1398–1402. Ieee, 2003.
- Sasic: Stereo image compression with latent shifts and stereo attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 661–670, 2022.
- Jack Keil Wolf. Data reduction for multiple correlated sources. In Proc. 5th Colloquium on Microwave Commun., Budapest, Hungary, June 1973, pp. 287–295, 1973.
- The rate-distortion function for source coding with side information at the decoder. IEEE Transactions on information Theory, 22(1):1–10, 1976.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856, 2018.
- Ldmic: Learning-based distributed multi-view image coding. arXiv preprint arXiv:2301.09799, 2023.
- Review of stereo matching algorithms based on deep learning. Computational intelligence and neuroscience, 2020, 2020.