Deep Planar Parallax for Monocular Depth Estimation (2301.03178v2)
Abstract: Recent research has highlighted the utility of Planar Parallax Geometry in monocular depth estimation. However, its potential has yet to be fully realized because networks rely heavily on appearance for depth prediction. Our in-depth analysis reveals that utilizing flow-pretrain can optimize the network's usage of consecutive frame modeling, leading to substantial performance enhancement. Additionally, we propose Planar Position Embedding (PPE) to handle dynamic objects that defy static scene assumptions and to tackle slope variations that are challenging to differentiate. Comprehensive experiments on autonomous driving datasets, namely KITTI and the Waymo Open Dataset (WOD), prove that our Planar Parallax Network (PPNet) significantly surpasses existing learning-based methods in performance.
- Attention attention everywhere: Monocular depth prediction with skip attention. In WACV, 2023.
- PointNetLK: Robust & efficient point cloud registration using PointNet. In CVPR, 2019.
- Multi-view depth estimation by fusing single-view depth probability with multi-view geometry. In CVPR, 2022.
- Detection of close cut-in and overtaking vehicles for driver assistance based on planar parallax. In IV, 2005.
- AdaBins: Depth estimation using adaptive bins. In CVPR, 2021.
- Unsupervised scale-consistent depth and ego-motion learning from monocular video. NeurIPS, 2019.
- Unsupervised scale-consistent depth learning from video. IJCV, 2021.
- A naturalistic open source movie for optical flow evaluation. In ECCV, 2012.
- Object modelling by registration of multiple range images. IVC, 1992.
- Parallax geometry of smooth surfaces in multiple views. In ICCV, 1999.
- Deep geometric functional maps: Robust feature learning for shape correspondence. In CVPR, 2020.
- FlowNet: Learning optical flow with convolutional networks. In ICCV, 2015.
- Depth map prediction from a single image using a multi-scale deep network. NeurIPS, 2014.
- Deep ordinal regression network for monocular depth estimation. In CVPR, 2018.
- Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching. In CVPR, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- FlowNet 2.0: Evolution of optical flow estimation with deep networks. In CVPR, 2017.
- Parallax geometry of pairs of points for 3D scene analysis. In ECCV, 1996.
- Direct recovery of planar-parallax from multiple frames. PAMI, 2002.
- Unsupervised learning of multi-frame optical flow with occlusions. In ECCV, 2018.
- Quantitative assessment method of image stitching performance based on estimation of planar parallax. IA, 2021.
- Robust consistent video depth estimation. In CVPR, 2021.
- From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv, 2019.
- DeepI2P: Image-to-point cloud registration via deep classification. In CVPR, 2021.
- Learning to fuse monocular and multi-view cues for multi-frame depth estimation in dynamic scenes. In CVPR, 2023.
- Va-depthnet: A variational approach to single image depth prediction. In ICLR, 2022.
- Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In CVPR, 2020.
- Swin Transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Multi-view depth estimation using epipolar spatio-temporal networks. In CVPR, 2021.
- Decoupled weight decay regularization. arXiv, 2017.
- Using planar parallax to estimate the time-to-contact. In CVPR, 1999.
- Consistent video depth estimation. ToG, 2020.
- Inverse perspective mapping simplifies optical flow computation and obstacle detection. Biological cybernetics, 1991.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, 2016.
- UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI, 2018.
- Object scene flow for autonomous vehicles. In CVPR, 2015.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
- idisc: Internal discretization for monocular depth estimation. In CVPR, 2023.
- Vision transformers for dense prediction. In ICCV, 2021.
- Harpreet S Sawhney. 3D geometry from planar parallax. In CVPR, 1994a.
- Harpreet S Sawhney. Motion video analysis using planar parallax. In Storage and Retrieval for Image and Video Databases II, 1994b.
- Harpreet S Sawhney. Simplifying motion and structure analysis using planar parallax and image warping. In ICPR, 1994c.
- Shashua and Navab. Relative affine structure: theory and application to 3D reconstruction from perspective views. In CVPR, 1994.
- PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In CVPR, 2018.
- Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
- RAFT: Recurrent all-pairs field transforms for optical flow. In ECCV, 2020a.
- Deepv2d: Video to depth with differentiable structure from motion. In ICLR, 2020b.
- Using plane+parallax for calibrating dense camera arrays. In CVPR, 2004.
- Monocular 3D object detection with depth from motion. In ECCV, 2022.
- Occlusion aware unsupervised learning of optical flow. In CVPR, 2018.
- Joint prediction of monocular depth and structure using planar and parallax geometry. PR, 2022.
- GMFlow: Learning optical flow via global matching. In CVPR, 2022.
- Gedepth: Ground embedding for monocular depth estimation. In ICCV, 2023.
- Enforcing geometric constraints of virtual normal for depth prediction. In ICCV, 2019.
- Virtual normal: Enforcing geometric constraints for accurate and robust depth prediction. PAMI, 2021.
- Detecting motion regions in the presence of a strong parallax from a moving camera by multiview geometric constraints. PAMI, 2007.
- Monocular road planar parallax estimation. arXiv, 2021.
- NeWCRFs: Neural window fully-connected CRFs for monocular depth estimation. In CVPR, 2022.
- MaskFlowNet: Asymmetric feature matching with learnable occlusion mask. In CVPR, 2020.
- Unsupervised deep epipolar flow for stationary or dynamic scenes. In CVPR, 2019.
- Lighteddepth: Video depth estimation in light of limited inference view angles. In CVPR, 2023.