Papers
Topics
Authors
Recent
2000 character limit reached

Continuous Cost Aggregation for Dual-Pixel Disparity Extraction

Published 13 Jun 2023 in cs.CV | (2306.07921v1)

Abstract: Recent works have shown that depth information can be obtained from Dual-Pixel (DP) sensors. A DP arrangement provides two views in a single shot, thus resembling a stereo image pair with a tiny baseline. However, the different point spread function (PSF) per view, as well as the small disparity range, makes the use of typical stereo matching algorithms problematic. To address the above shortcomings, we propose a Continuous Cost Aggregation (CCA) scheme within a semi-global matching framework that is able to provide accurate continuous disparities from DP images. The proposed algorithm fits parabolas to matching costs and aggregates parabola coefficients along image paths. The aggregation step is performed subject to a quadratic constraint that not only enforces the disparity smoothness but also maintains the quadratic form of the total costs. This gives rise to an inherently efficient disparity propagation scheme with a pixel-wise minimization in closed-form. Furthermore, the continuous form allows for a robust multi-scale aggregation that better compensates for the varying PSF. Experiments on DP data from both DSLR and phone cameras show that the proposed scheme attains state-of-the-art performance in DP disparity estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Improving single-image defocus deblurring: How dual-pixel images help through multi-task learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1231–1239, January 2022.
  2. Multi-view motion synthesis via applying rotated dual-pixel blur kernels. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pages 701–708, January 2022.
  3. Defocus deblurring using dual-pixel data. In Proc. ECCV, 2020.
  4. Learning to reduce defocus blur by realistically modeling dual-pixel data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2289–2298, October 2021.
  5. Real-time stereo vision system using semi-global matching disparity estimation: Architecture and fpga-implementation. In 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pages 93–101. IEEE, 2010.
  6. The fast bilateral solver. In European conference on computer vision, pages 617–632. Springer, 2016.
  7. Fast approximate energy minimization via graph cuts. IEEE Transactions on pattern analysis and machine intelligence, 23(11):1222–1239, 2001.
  8. G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
  9. Depth map prediction from a single image using a multi-scale deep network. In Proc. NeurIPS, 2014.
  10. An investigation of methods for determining depth from focus. IEEE T PAMI, 15(2):97–108, 1993.
  11. Mgm: A significantly more global matching for stereovision. In BMVC 2015, 2015.
  12. Deep ordinal regression network for monocular depth estimation. In Proc. CVPR, 2018.
  13. Learning single camera depth estimation using dualpixels. In Proc. ICCV, 2019.
  14. Depth estimation from a single image using deep learned phase coded mask. IEEE Transactions on Computational Imaging, 4(3):298–310, 2018.
  15. Literature survey on stereo vision disparity map algorithms. Journal of Sensors, 2016, 2016.
  16. Guided image filtering. IEEE transactions on pattern analysis and machine intelligence, 35(6):1397–1409, 2012.
  17. Learning to autofocus. In Proc. CVPR, 2020.
  18. Heiko Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE T PAMI, 30(2):328–341, 2008.
  19. Memory efficient semi-global matching. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3:371–376, 2012.
  20. Evaluation of stereo matching costs on images with radiometric differences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):1582–1599, 2009.
  21. An overview of depth cameras and range scanners based on time-of-flight technologies. Machine Vision and Applications, 27:1005–1020, Oct. 2016.
  22. Facial depth and normal estimation using single dual-pixel camera. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII, pages 181–200. Springer, 2022.
  23. Spatio-focal bidirectional disparity estimation from a dual-pixel image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5023–5032, June 2023.
  24. Memory-efficient parametric semiglobal matching. IEEE Signal Processing Letters, 25(2):194–198, 2017.
  25. Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG), 26(3):70–es, 2007.
  26. Monocular depth estimation using deep learning: A review. Sensors (Basel), 22(14), July 2022.
  27. Focus on defocus: bridging the synthetic to real domain gap for depth estimation. In Proc. CVPR, 2020.
  28. Real-time stereo vision: Optimizing semi-global matching. In 2013 IEEE Intelligent Vehicles Symposium (IV), pages 1197–1202, 2013.
  29. New sub-pixel interpolation functions for accurate real-time stereo-matching algorithms. In 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pages 173–178, 2015.
  30. Deep learning for monocular depth estimation: A review. Neurocomputing, 438:14–33, 2021.
  31. Dual pixel exploration: Simultaneous depth estimation and image restoration. In Proc. CVPR, pages 4340–4349, 2021.
  32. An enhanced correlation-based method for stereo correspondence with subpixel accuracy. In Proc. ICCV, 2005.
  33. Modeling defocus-disparity in dual-pixel sensors. In Proc. ICCP, 2020.
  34. Reflection removal using a dual-pixel sensor. In Proc. CVPR, 2019.
  35. R3sgm: Real-time raster-respecting semi-global matching for power-constrained systems. In 2018 International Conference on Field-Programmable Technology (FPT), pages 102–109. IEEE, 2018.
  36. Kinect range sensing: Structured-light versus time-of-flight kinect. Comput. Vis. Image Underst., 139:1–20, 2015.
  37. High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition, pages 31–42. Springer, 2014.
  38. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1):7–42, 2002.
  39. Learning to fuse proposals from multiple scanline optimizations in semi-global matching. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  40. Matching cost computation algorithm and high speed fpga architecture for high quality real-time semi global matching stereo vision for road scenes. In 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 3064–3069. IEEE, 2014.
  41. Sgm-nets: Semi-global matching with neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 231–240, 2017.
  42. Improved stereo matching with constant highway networks and reflective confidence learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4641–4650, 2017.
  43. Synthetic depth-of-field with a single-camera mobile phone. In Proc. SIGGRAPH, 2018.
  44. Defocus map estimation and deblurring from a single dual-pixel image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2228–2238, 2021.
  45. K3dn: Disparity-aware kernel estimation for dual-pixel defocus deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13263–13272, June 2023.
  46. Du22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTnet: Learning depth estimation from dual-cameras and dual-pixels. In Proc. ECCV, pages 582–598, 2020.
Citations (1)

Summary

  • The paper introduces Continuous Cost Aggregation, a method that fits parabolic models to matching costs for precise sub-pixel disparity extraction in dual-pixel sensors.
  • The approach employs multi-scale disparity fusion with quadratic constraints, effectively compensating for varying point spread functions across resolutions.
  • Performance evaluations on DSLR and mobile-phone images demonstrate state-of-the-art accuracy without extensive training, underscoring its real-world applicability.

Continuous Cost Aggregation for Dual-Pixel Disparity Extraction

Introduction

The paper introduces a novel method for extracting disparity information from dual-pixel (DP) sensors, which are increasingly prevalent in modern cameras. Unlike conventional stereo cameras, DP sensors provide two views from a single shot, resulting in a stereo pair with a minimal baseline. This unique characteristic poses challenges for traditional stereo matching algorithms due to the differential point spread function (PSF) per view and the limited disparity range. The proposed method, Continuous Cost Aggregation (CCA), addresses these issues by integrating a continuous cost optimization approach within a semi-global matching framework, achieving accurate sub-pixel disparities from DP images.

Proposed Method

The CCA methodology commences by fitting parabolas to the matching costs and aggregating the coefficients along image paths. The aggregation is governed by a quadratic constraint, maintaining the quadratic form of the total costs and ensuring disparity smoothness. This leads to a highly efficient propagation scheme, allowing pixel-wise minimization in closed-form. The continuous paradigm facilitates robust multi-scale aggregation, which compensates effectively for the varying PSF across resolutions. This novel strategy results in enhanced disparity estimation, capable of performing comparably well on both DSLR and mobile-phone DP images.

Key Contributions

  1. Continuous Cost Aggregation: The introduction of a continuous cost aggregation mechanism offers an efficient solution to disparity extraction challenges inherent in DP images. By using parabolas to model continuous costs, the algorithm allows for precise sub-pixel disparity estimation while minimizing computation.
  2. Multi-Scale Disparity Fusion: The algorithm incorporates multi-scale disparity fusion, rectifying errors and bolstering performance in areas with significant depth-of-field variation. This multi-scale approach is instrumental in managing the depth estimation problems posed by varying PSF across different DP devices.
  3. Performance Evaluation: Extensive evaluations using DP data from DSLR and phone cameras illustrate the algorithm's state-of-the-art performance. The algorithm achieves high accuracy without the need for extensive datasets or training, unlike many learning-based approaches.

Performance and Evaluation

The performance tests with DP datasets reveal that the proposed CCA method attains state-of-the-art results in disparity estimation. Its adaptability is demonstrated across DSLR and smartphone platforms, underscoring its robustness without the need for device-specific training. Additionally, CCA shows potential suitability for application in conventional stereo images, competing well against existing methods in terms of accuracy and computational efficiency.

Implications and Future Work

The implications of this research extend to various applications in computer vision where depth estimation plays a crucial role, such as autonomous navigation and image manipulation. The technique's device-independent nature and ability to operate without extensive training datasets make it ideal for real-world applications where cross-device compatibility is essential. Future work could explore advanced priors in multi-scale aggregation, enhanced cost functions leveraging deep learning, and higher-order polynomial representations for expanded use cases.

Conclusion

The Continuous Cost Aggregation method introduced offers a compelling advancement in DP disparity extraction, providing a robust, efficient, and application-agnostic solution. It aligns with modern demands for high-fidelity depth estimation across diverse camera systems, bridging an essential gap in computer vision with precision and efficiency.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.