Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-task Planar Reconstruction with Feature Warping Guidance (2311.14981v2)

Published 25 Nov 2023 in cs.CV

Abstract: Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image, which is particularly useful for indoor or man-made environments. Efficient reconstruction of 3D planes coupled with semantic predictions offers advantages for a wide range of applications requiring scene understanding and concurrent spatial mapping. However, most existing planar reconstruction models either neglect semantic predictions or do not run efficiently enough for real-time applications. We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture which simultaneously predicts semantics for each plane instance, along with plane parameters and piece-wise plane instance masks. We achieve an improvement in instance mask segmentation by including multi-view guidance for plane predictions in the training process. This cross-task improvement, training for plane prediction but improving the mask segmentation, is due to the nature of feature sharing in multi-task learning. Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Multi-frame attention with feature-level warping for drone crowd tracking. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1664–1673.
  2. Caruana, R. (1997). Multitask learning. Machine learning, 28:41–75.
  3. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13039–13048.
  4. Visibility-aware point-based multi-view stereo network. IEEE transactions on pattern analysis and machine intelligence, 43(10):3695–3708.
  5. Manhattan World: Orientation and Outlier Detection by Bayesian Inference. Neural Computation, 15(5):1063–1088.
  6. Crawshaw, M. (2020). Multi-task learning with deep neural networks: A survey. ArXiv, abs/2009.09796.
  7. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
  8. Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8585–8594.
  9. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27.
  10. Unsupervised learning of depth and camera pose with feature map warping. Sensors, 21(3).
  11. Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988.
  12. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
  13. Dpsnet: End-to-end deep plane sweep stereo. In 7th International Conference on Learning Representations, ICLR.
  14. Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944.
  15. Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007.
  16. Planercnn: 3d plane detection and reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  17. Planenet: Piece-wise planar reconstruction from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  18. Planemvs: 3d plane reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8665–8675.
  19. An intriguing failing of convolutional neural networks and the coordconv solution. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
  20. Learning pairwise inter-plane relations for piecewise planar reconstruction. In European Conference on Computer Vision.
  21. Which tasks should be learned together in multi-task learning? In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9120–9132. PMLR.
  22. Planetr: Structure-guided transformers for 3d plane recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4186–4195.
  23. Solo: Segmenting objects by locations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 649–665. Springer.
  24. Solov2: Dynamic and fast instance segmentation. Advances in Neural information processing systems, 33:17721–17732.
  25. Reconstructing piecewise planar scenes with multi-view regularization. Computational Visual Media, 5(4):337–345.
  26. Planarrecon: Real-time 3d plane detection and reconstruction from posed monocular videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6219–6228.
  27. Planesegnet: Fast and robust plane estimation using a single-stage instance segmentation cnn. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13574–13580. IEEE.
  28. Planerecnet: Multi-task learning with cross-task consistency for piece-wise plane detection and reconstruction from a single rgb image. In British Machine Vision Conference.
  29. Holistically-attracted wireframe parsing: From supervised to self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14727–14744.
  30. Recovering 3d planes from a single image via convolutional neural networks. In Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y., editors, Computer Vision – ECCV 2018, pages 87–103, Cham. Springer International Publishing.
  31. Mvsnet: Depth inference for unstructured multi-view stereo. European Conference on Computer Vision (ECCV).
  32. Single-image piece-wise planar 3d reconstruction via associative embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1029–1037.
  33. Planckian jitter: countering the color-crippling effects of color jitter on self-supervised training. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.

Summary

We haven't generated a summary for this paper yet.