Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Monocular Depth from Focus with Event Focal Stack (2405.06944v1)

Published 11 May 2024 in cs.CV

Abstract: Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency, which provides more temporal information for focus time acquisition. In this study, we propose the EDFF Network to estimate sparse depth from the Event Focal Stack. Specifically, we utilize the event voxel grid to encode intensity change information and project event time surface into the depth domain to preserve per-pixel focal distance information. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above. Additionally, we propose a Multi-level Depth Fusion Block designed to integrate results from each level of a UNet-like architecture and produce the final output. Extensive experiments validate that our method outperforms existing state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. F. Yang, X. Huang, and Z. Zhou, “Deep depth from focus with differential focus volume,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 12 642–12 651.
  2. A. Bhoi, “Monocular depth estimation: A survey,” arXiv preprint arXiv:1901.09402, 2019.
  3. A. P. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell., no. 4, pp. 523–531, 1987.
  4. M. Subbarao and G. Surya, “Depth from defocus: A spatial domain approach,” Int. J. Comput. Vis., vol. 13, no. 3, pp. 271–294, 1994.
  5. S. Zhuo and T. Sim, “Defocus map estimation from a single image,” Pattern Recognition, vol. 44, no. 9, pp. 1852–1858, 2011.
  6. S. Gur and L. Wolf, “Single image depth estimation trained via depth from defocus cues,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 7683–7692.
  7. H. Si, B. Zhao, D. Wang, Y. Gao, M. Chen, Z. Wang, and X. Li, “Fully self-supervised depth estimation from defocus clue,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 9140–9149.
  8. P. Grossmann, “Depth from focus,” Pattern Recog. Letters, vol. 5, no. 1, pp. 63–69, 1987.
  9. J. Ens and P. Lawrence, “An investigation of methods for determining depth from focus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 2, pp. 97–108, 1993.
  10. M. Moeller, M. Benning, C. Schönlieb, and D. Cremers, “Variational depth from focus reconstruction,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 5369–5378, 2015.
  11. S. Suwajanakorn, C. Hernandez, and S. M. Seitz, “Depth from focus with your mobile phone,” in IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3497–3506.
  12. J. Surh, H.-G. Jeon, Y. Park, S. Im, H. Ha, and I. So Kweon, “Noise robust depth from focus using a ring difference filter,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 6328–6337.
  13. C. Hazirbas, S. G. Soyer, M. C. Staab, L. Leal-Taixé, and D. Cremers, “Deep depth from focus,” in ACCV.   Springer, 2019, pp. 525–541.
  14. S. Chiavazza, S. M. Meyer, and Y. Sandamirskaya, “Low-latency monocular depth estimation using event timing on neuromorphic hardware,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 4071–4080.
  15. J. Furmonas, J. Liobe, and V. Barzdenas, “Analytical review of event-based camera depth estimation methods and systems,” Sensors, vol. 22, no. 3, p. 1201, 2022.
  16. S. Gasperini, N. Morbitzer, H. Jung, N. Navab, and F. Tombari, “Robust monocular depth estimation under challenging conditions,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 8177–8186.
  17. X. Liu, J. Li, J. Shi, X. Fan, Y. Tian, and D. Zhao, “Event-based monocular depth estimation with recurrent transformers,” IEEE Trans. Circuit Syst. Video Technol., 2024.
  18. G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis et al., “Event-based vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, 2020.
  19. H. Lou, M. Teng, Y. Yang, and B. Shi, “All-in-focus imaging from event focal stack,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 17 366–17 375.
  20. G. Haessig, X. Berthelon, S.-H. Ieng, and R. Benosman, “A spiking neural network model of depth from defocus for event-based neuromorphic vision,” Scientific Reports, vol. 9, no. 1, p. 3744, 2019.
  21. E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” arXiv preprint arXiv:1510.08829, 2015.
  22. A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 989–997.
  23. D. Gehrig, A. Loquercio, K. G. Derpanis, and D. Scaramuzza, “End-to-end learning of representations for asynchronous event-based data,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 5633–5643.
  24. A. Z. Zhu and L. Yuan, “Ev-flownet: Self-supervised optical flow estimation for event-based cameras,” in Robotics: Science and Systems, 2018.
  25. X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: a hierarchy of event-based time-surfaces for pattern recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 7, pp. 1346–1359, 2016.
  26. A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman, “Hats: Histograms of averaged time surfaces for robust event-based object classification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 1731–1740.
  27. Y. Fujimura, M. Iiyama, T. Funatomi, and Y. Mukaigawa, “Deep depth from focal stack with defocus model for camera-setting invariance,” Int. J. Comput. Vis., pp. 1–16, 2023.
  28. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2472–2481.
  29. M. Maximov, K. Galim, and L. Leal-Taixé, “Focus on defocus: bridging the synthetic to real domain gap for depth estimation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 1071–1080.
  30. N.-H. Wang, R. Wang, Y.-L. Liu, Y.-H. Huang, Y.-L. Chang, C.-P. Chen, and K. Jou, “Bridging unsupervised and supervised depth from focus via all-in-focus supervision,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 12 621–12 631.
  31. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Eur. Conf. Comput. Vis.   Springer, 2012, pp. 746–760.
  32. H. Rebecq, D. Gehrig, and D. Scaramuzza, “ESIM: an open event camera simulator,” Conf. on Robotics Learning, 2018.
  33. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. Neural Inform. Process. Syst., vol. 32, 2019.
  34. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
  35. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv:1608.03983, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com