Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction (2309.08424v2)

Published 15 Sep 2023 in cs.CV

Abstract: Segmentation of planar regions from a single RGB image is a particularly important task in the perception of complex scenes. To utilize both visual and geometric properties in images, recent approaches often formulate the problem as a joint estimation of planar instances and dense depth through feature fusion mechanisms and geometric constraint losses. Despite promising results, these methods do not consider cross-task feature distillation and perform poorly in boundary regions. To overcome these limitations, we propose X-PDNet, a framework for the multitask learning of plane instance segmentation and depth estimation with improvements in the following two aspects. Firstly, we construct the cross-task distillation design which promotes early information sharing between dual-tasks for specific task improvements. Secondly, we highlight the current limitations of using the ground truth boundary to develop boundary regression loss, and propose a novel method that exploits depth information to support precise boundary region segmentation. Finally, we manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset and make available for evaluation of plane instance segmentation. Through the experiments, our proposed methods prove the advantages, outperforming the baseline with large improvement margins in the quantitative results on the ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of our proposals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Yolact++ better real-time instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:1108–1121, 2022.
  2. Exploring relational context for multi-task dense prediction. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15849–15858, 2021.
  3. Rethinking atrous convolution for semantic image segmentation. ArXiv, abs/1706.05587, 2017.
  4. Boundary-preserving mask r-cnn. In ECCV, 2020.
  5. Scannet: Richly-annotated 3d reconstructions of indoor scenes. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2432–2443, 2017.
  6. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  7. Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
  8. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
  9. Pointrend: Image segmentation as rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9796–9805, 2020.
  10. Reconstructing building mass models from uav images. Comput. Graph., 54:84–93, 2016.
  11. Planenet: Piece-wise planar reconstruction from a single rgb image. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2579–2588, 2018.
  12. Planercnn: 3d plane detection and reconstruction from a single image. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4445–4454, 2019.
  13. Cross-task attention mechanism for dense multi-task learning. ArXiv, abs/2206.08927, 2022.
  14. Pytorch: An imperative style, high-performance deep learning library. ArXiv, abs/1912.01703, 2019.
  15. Slamcraft: Dense planar rgb monocular slam. 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6, 2019.
  16. Gated-scnn: Gated shape cnns for semantic segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5228–5237, 2019.
  17. Real-time indoor scene understanding using bayesian filtering with motion cues. 2011 International Conference on Computer Vision, pages 121–128, 2011.
  18. Mti-net: Multi-scale task interaction networks for multi-task learning. In ECCV, 2020.
  19. Solov2: Dynamic and fast instance segmentation. arXiv: Computer Vision and Pattern Recognition, 2020.
  20. Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16805–16814, 2022.
  21. Planesegnet: Fast and robust plane estimation using a single-stage instance segmentation cnn. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13574–13580, 2021a.
  22. Planerecnet: Multi-task learning with cross-task consistency for piece-wise plane detection and reconstruction from a single rgb image. In BMVC, 2021b.
  23. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 675–684, 2018.
  24. Recovering 3d planes from a single image via convolutional neural networks. In ECCV, 2018.
  25. Single-image piece-wise planar 3d reconstruction via associative embedding. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1029–1037, 2019.
  26. Refinemask: Towards high-quality instance segmentation with fine-grained features. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6857–6865, 2021.
  27. Bsolo: Boundary-aware one-stage instance segmentation solo. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2594–2598, 2022.
  28. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4101–4110, 2019.
  29. Sharpcontour: A contour-based boundary refinement approach for efficient and accurate instance segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4382–4391, 2022.
  30. Deformable convnets v2: More deformable, better results. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9300–9308, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.