FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation (2403.19294v1)
Abstract: Self-supervised multi-frame methods have currently achieved promising results in depth estimation. However, these methods often suffer from mismatch problems due to the moving objects, which break the static assumption. Additionally, unfairness can occur when calculating photometric errors in high-freq or low-texture regions of the images. To address these issues, existing approaches use additional semantic priori black-box networks to separate moving objects and improve the model only at the loss level. Therefore, we propose FlowDepth, where a Dynamic Motion Flow Module (DMFM) decouples the optical flow by a mechanism-based approach and warps the dynamic regions thus solving the mismatch problem. For the unfairness of photometric errors caused by high-freq and low-texture regions, we use Depth-Cue-Aware Blur (DCABlur) and Cost-Volume sparsity loss respectively at the input and the loss level to solve the problem. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms the state-of-the-art methods.
- Unsupervised learning of depth and ego-motion from video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nov 2017.
- The temporal opportunist: Self-supervised multi-frame monocular depth. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nov 2021.
- Disentangling object motion and occlusion for unsupervised multi-frame monocular depth. In European Conference on Computer Vision, pages 228–244. Springer, 2022.
- Self-supervised monocular depth estimation: Solving the edge-fattening problem. Oct 2022.
- Unsupervised monocular depth estimation with left-right consistency. arXiv: Computer Vision and Pattern Recognition, Sep 2016.
- Digging into self-supervised monocular depth estimation. Cornell University - arXiv, Jun 2018.
- Monorec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. Cornell University - arXiv, Nov 2020.
- Effiscene: Efficient per-pixel rigidity inference for unsupervised joint learning of optical flow, depth, camera pose and motion segmentation. Cornell University - arXiv, Nov 2020.
- Frequency-aware self-supervised monocular depth estimation. Oct 2022.
- 3d packing for self-supervised monocular depth estimation. arXiv: Computer Vision and Pattern Recognition, May 2019.
- Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation. Nov 2022.
- Tak-Wai Hui. Rm-depth: Unsupervised learning of recurrent monocular depth in dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1675–1684, 2022.
- Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, page 600–612, Apr 2004.
- Don’t forget the past: Recurrent depth estimation from monocular video. arXiv: Computer Vision and Pattern Recognition, Jan 2020.
- DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency, page 38–55. Jan 2018.
- Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019.
- Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2019.
- Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. arXiv: Computer Vision and Pattern Recognition, Nov 2018.
- Learning monocular depth in dynamic scenes via instance-aware projection consistency. arXiv: Computer Vision and Pattern Recognition, Feb 2021.
- Attentive and contrastive learning for joint depth and motion field estimation. International Conference on Computer Vision, Jan 2021.
- Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Feb 2020.
- RAFT: Recurrent All-Pairs Field Transforms for Optical Flow, page 402–419. Nov 2020.
- Softmax splatting for video frame interpolation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Aug 2020.
- S2r-depthnet: Learning a generalizable depth-specific structural representation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nov 2021.
- Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2021.
- Unsupervised monocular depth learning in dynamic scenes. Cornell University - arXiv, Oct 2020.
- Object scene flow for autonomous vehicles. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015.
- The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
- Unsupervised monocular depth estimation with left-right consistency. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
- Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015.
- Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, page 1551–1579, May 2021.