Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pyramid Feature Attention Network for Monocular Depth Prediction (2403.01440v1)

Published 3 Mar 2024 in cs.CV

Abstract: Deep convolutional neural networks (DCNNs) have achieved great success in monocular depth estimation (MDE). However, few existing works take the contributions for MDE of different levels feature maps into account, leading to inaccurate spatial layout, ambiguous boundaries and discontinuous object surface in the prediction. To better tackle these problems, we propose a Pyramid Feature Attention Network (PFANet) to improve the high-level context features and low-level spatial features. In the proposed PFANet, we design a Dual-scale Channel Attention Module (DCAM) to employ channel attention in different scales, which aggregate global context and local information from the high-level feature maps. To exploit the spatial relationship of visual features, we design a Spatial Pyramid Attention Module (SPAM) which can guide the network attention to multi-scale detailed information in the low-level feature maps. Finally, we introduce scale-invariant gradient loss to increase the penalty on errors in depth-wise discontinuous regions. Experimental results show that our method outperforms state-of-the-art methods on the KITTI dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Depth map prediction from a single image using a multi-scale deep network,” in NIPS, 2014, pp. 2366–2374.
  2. “Deep ordinal regression network for monocular depth estimation,” in CVPR, 2018, pp. 2002–2011.
  3. “From big to small: Multi-scale local planar guidance for monocular depth estimation,” CoRR, vol. abs/1907.10326, 2019.
  4. “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  5. “Densely connected convolutional networks,” CVPR, pp. 2261–2269, 2017.
  6. “Aggregated residual transformations for deep neural networks,” in CVPR, 2017, pp. 5987–5995.
  7. “High quality monocular depth estimation via transfer learning,” CoRR, vol. abs/1812.11941, 2018.
  8. “Bidirectional attention network for monocular depth estimation,” CoRR, vol. abs/2009.00743, 2020.
  9. “Improving monocular depth estimation by leveraging structural awareness and complementary datasets,” ECCV, 2020.
  10. “Denseaspp for semantic segmentation in street scenes,” in CVPR, 2018, pp. 3684–3692.
  11. “Demon: Depth and motion network for learning monocular stereo,” in CVPR, 2017, pp. 5622–5631.
  12. “Learning depth from single monocular images,” in NIPS, 2005.
  13. “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in ICCV, 2015, pp. 2650–2658.
  14. “Deeper depth prediction with fully convolutional residual networks,” in 3DV, 2016, pp. 239–248.
  15. “Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss,” in ECCV, 2018.
  16. “Squeeze-and-excitation networks,” in CVPR, 2018, pp. 7132–7141.
  17. “Cbam: Convolutional block attention module,” in ECCV, 2018.
  18. “Hierarchical pyramid diverse attention networks for face recognition,” in CVPR, 2020, pp. 8323–8332.
  19. “Enforcing geometric constraints of virtual normal for depth prediction,” in ICCV. 2019, pp. 5683–5692, IEEE.
  20. “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231 – 1237, 2013.
Citations (10)

Summary

We haven't generated a summary for this paper yet.