Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging (2402.18102v2)
Abstract: Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potential solution to achieve the same. DP sensors collect light rays from two different halves of the lens in two interleaved pixel arrays, thus capturing two slightly different views of the scene, like a stereo camera system. However, imaging with a DP sensor implies that the defocus blur size is directly proportional to the disparity seen between the views. This creates a trade-off between disparity estimation vs. deblurring accuracy. To improve this trade-off effect, we propose CADS (Coded Aperture Dual-Pixel Sensing), in which we use a coded aperture in the imaging lens along with a DP sensor. In our approach, we jointly learn an optimal coded pattern and the reconstruction algorithm in an end-to-end optimization setting. Our resulting CADS imaging system demonstrates improvement of >1.5dB PSNR in all-in-focus (AIF) estimates and 5-6% in depth estimation quality over naive DP sensing for a wide range of aperture settings. Furthermore, we build the proposed CADS prototypes for DSLR photography settings and in an endoscope and a dermoscope form factor. Our novel coded dual-pixel sensing approach demonstrates accurate RGB-D reconstruction results in simulations and real-world experiments in a passive, snapshot, and compact manner.
- Defocus deblurring using dual-pixel data. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pages 111–126. Springer, 2020.
- Learning to reduce defocus blur by realistically modeling dual-pixel data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2289–2298, 2021.
- Photometric stereo with general, unknown lighting. International Journal of computer vision, 72:239–257, 2007.
- Three-dimensional endoscopy: The future of nasoendoscopic training. The Laryngoscope, 129(6):1280–1285, 2019.
- Color-encoded structured light for rapid active ranging. IEEE Transactions on Pattern Analysis and Machine Intelligence, (1):14–28, 1987.
- Practical poissonian-gaussian noise modeling and fitting for single-image raw-data. IEEE transactions on image processing, 17(10):1737–1754, 2008.
- Shape acquisition and registration for 3d endoscope based on grid pattern projection. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pages 399–415. Springer, 2016.
- Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- Depth estimation from a single image using deep learned phase coded mask. IEEE Transactions on Computational Imaging, 4(3):298–310, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Depth from defocus with learned optics for imaging and occlusion-aware depth estimation. In 2021 IEEE International Conference on Computational Photography (ICCP), pages 1–12. IEEE, 2021.
- Sensor-based auto-focusing system using multi-scale feature extraction and phase correlation matching. Sensors, 15(3):5747–5762, 2015.
- Focus stacking in non-contact dermoscopy. Biomedical physics & engineering express, 8(6):065022, 2022.
- Facial depth and normal estimation using single dual-pixel camera. In European Conference on Computer Vision, pages 181–200. Springer, 2022.
- Spatio-focal bidirectional disparity estimation from a dual-pixel image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5023–5032, 2023.
- Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG), 26(3):70–es, 2007.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
- Comparison of 3d endoscopy and conventional 2d endoscopy in gastric endoscopic submucosal dissection: an ex vivo animal study. Surgical endoscopy, 33:4164–4170, 2019.
- Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical image analysis, 71:102058, 2021.
- Dual pixel exploration: Simultaneous depth estimation and image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4340–4349, 2021.
- Reflection removal using a dual-pixel sensor. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1556–1565, 2019.
- Modeling defocus-disparity in dual-pixel sensors. In 2020 IEEE International Conference on Computational Photography (ICCP), pages 1–12. IEEE, 2020.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Structured illumination microscopy. Advances in Optics and Photonics, 7(2):241–275, 2015.
- Data driven coded aperture design for depth recovery. In 2017 IEEE International Conference on Image Processing (ICIP), pages 56–60. IEEE, 2017.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
- Indoor segmentation and support inference from rgbd images. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pages 746–760. Springer, 2012.
- A simple model for on-sensor phase-detection autofocusing algorithm. Journal of Computer and Communications, 1(06):11, 2013.
- Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing. ACM Trans. Graph., 26(3):69, 2007.
- Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4):1–13, 2018.
- Phasecam3d—learning phase masks for passive single view depth estimation. In 2019 IEEE International Conference on Computational Photography (ICCP), pages 1–12. IEEE, 2019.
- Defocus map estimation and deblurring from a single dual-pixel image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2228–2238, 2021.
- K3dn: Disparity-aware kernel estimation for dual-pixel defocus deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13263–13272, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018a.
- Du 2 net: Learning depth estimation from dual-cameras and dual-pixels. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 582–598. Springer, 2020.
- Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters, 15(5):749–753, 2018b.
- Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
- Coded aperture pairs for depth from defocus. In 2009 IEEE 12th international conference on computer vision, pages 325–332. IEEE, 2009.