Learning Monocular Depth from Focus with Event Focal Stack (2405.06944v1)
Abstract: Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency, which provides more temporal information for focus time acquisition. In this study, we propose the EDFF Network to estimate sparse depth from the Event Focal Stack. Specifically, we utilize the event voxel grid to encode intensity change information and project event time surface into the depth domain to preserve per-pixel focal distance information. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above. Additionally, we propose a Multi-level Depth Fusion Block designed to integrate results from each level of a UNet-like architecture and produce the final output. Extensive experiments validate that our method outperforms existing state-of-the-art approaches.
- F. Yang, X. Huang, and Z. Zhou, “Deep depth from focus with differential focus volume,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 12 642–12 651.
- A. Bhoi, “Monocular depth estimation: A survey,” arXiv preprint arXiv:1901.09402, 2019.
- A. P. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell., no. 4, pp. 523–531, 1987.
- M. Subbarao and G. Surya, “Depth from defocus: A spatial domain approach,” Int. J. Comput. Vis., vol. 13, no. 3, pp. 271–294, 1994.
- S. Zhuo and T. Sim, “Defocus map estimation from a single image,” Pattern Recognition, vol. 44, no. 9, pp. 1852–1858, 2011.
- S. Gur and L. Wolf, “Single image depth estimation trained via depth from defocus cues,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 7683–7692.
- H. Si, B. Zhao, D. Wang, Y. Gao, M. Chen, Z. Wang, and X. Li, “Fully self-supervised depth estimation from defocus clue,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 9140–9149.
- P. Grossmann, “Depth from focus,” Pattern Recog. Letters, vol. 5, no. 1, pp. 63–69, 1987.
- J. Ens and P. Lawrence, “An investigation of methods for determining depth from focus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 2, pp. 97–108, 1993.
- M. Moeller, M. Benning, C. Schönlieb, and D. Cremers, “Variational depth from focus reconstruction,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 5369–5378, 2015.
- S. Suwajanakorn, C. Hernandez, and S. M. Seitz, “Depth from focus with your mobile phone,” in IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3497–3506.
- J. Surh, H.-G. Jeon, Y. Park, S. Im, H. Ha, and I. So Kweon, “Noise robust depth from focus using a ring difference filter,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 6328–6337.
- C. Hazirbas, S. G. Soyer, M. C. Staab, L. Leal-Taixé, and D. Cremers, “Deep depth from focus,” in ACCV. Springer, 2019, pp. 525–541.
- S. Chiavazza, S. M. Meyer, and Y. Sandamirskaya, “Low-latency monocular depth estimation using event timing on neuromorphic hardware,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 4071–4080.
- J. Furmonas, J. Liobe, and V. Barzdenas, “Analytical review of event-based camera depth estimation methods and systems,” Sensors, vol. 22, no. 3, p. 1201, 2022.
- S. Gasperini, N. Morbitzer, H. Jung, N. Navab, and F. Tombari, “Robust monocular depth estimation under challenging conditions,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 8177–8186.
- X. Liu, J. Li, J. Shi, X. Fan, Y. Tian, and D. Zhao, “Event-based monocular depth estimation with recurrent transformers,” IEEE Trans. Circuit Syst. Video Technol., 2024.
- G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis et al., “Event-based vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, 2020.
- H. Lou, M. Teng, Y. Yang, and B. Shi, “All-in-focus imaging from event focal stack,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 17 366–17 375.
- G. Haessig, X. Berthelon, S.-H. Ieng, and R. Benosman, “A spiking neural network model of depth from defocus for event-based neuromorphic vision,” Scientific Reports, vol. 9, no. 1, p. 3744, 2019.
- E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” arXiv preprint arXiv:1510.08829, 2015.
- A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 989–997.
- D. Gehrig, A. Loquercio, K. G. Derpanis, and D. Scaramuzza, “End-to-end learning of representations for asynchronous event-based data,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 5633–5643.
- A. Z. Zhu and L. Yuan, “Ev-flownet: Self-supervised optical flow estimation for event-based cameras,” in Robotics: Science and Systems, 2018.
- X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: a hierarchy of event-based time-surfaces for pattern recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 7, pp. 1346–1359, 2016.
- A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman, “Hats: Histograms of averaged time surfaces for robust event-based object classification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 1731–1740.
- Y. Fujimura, M. Iiyama, T. Funatomi, and Y. Mukaigawa, “Deep depth from focal stack with defocus model for camera-setting invariance,” Int. J. Comput. Vis., pp. 1–16, 2023.
- Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2472–2481.
- M. Maximov, K. Galim, and L. Leal-Taixé, “Focus on defocus: bridging the synthetic to real domain gap for depth estimation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 1071–1080.
- N.-H. Wang, R. Wang, Y.-L. Liu, Y.-H. Huang, Y.-L. Chang, C.-P. Chen, and K. Jou, “Bridging unsupervised and supervised depth from focus via all-in-focus supervision,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 12 621–12 631.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Eur. Conf. Comput. Vis. Springer, 2012, pp. 746–760.
- H. Rebecq, D. Gehrig, and D. Scaramuzza, “ESIM: an open event camera simulator,” Conf. on Robotics Learning, 2018.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. Neural Inform. Process. Syst., vol. 32, 2019.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
- I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv:1608.03983, 2016.