Multi-task Planar Reconstruction with Feature Warping Guidance (2311.14981v2)
Abstract: Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image, which is particularly useful for indoor or man-made environments. Efficient reconstruction of 3D planes coupled with semantic predictions offers advantages for a wide range of applications requiring scene understanding and concurrent spatial mapping. However, most existing planar reconstruction models either neglect semantic predictions or do not run efficiently enough for real-time applications. We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture which simultaneously predicts semantics for each plane instance, along with plane parameters and piece-wise plane instance masks. We achieve an improvement in instance mask segmentation by including multi-view guidance for plane predictions in the training process. This cross-task improvement, training for plane prediction but improving the mask segmentation, is due to the nature of feature sharing in multi-task learning. Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.
- Multi-frame attention with feature-level warping for drone crowd tracking. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1664–1673.
- Caruana, R. (1997). Multitask learning. Machine learning, 28:41–75.
- You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13039–13048.
- Visibility-aware point-based multi-view stereo network. IEEE transactions on pattern analysis and machine intelligence, 43(10):3695–3708.
- Manhattan World: Orientation and Outlier Detection by Bayesian Inference. Neural Computation, 15(5):1063–1088.
- Crawshaw, M. (2020). Multi-task learning with deep neural networks: A survey. ArXiv, abs/2009.09796.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
- Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8585–8594.
- Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27.
- Unsupervised learning of depth and camera pose with feature map warping. Sensors, 21(3).
- Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
- Dpsnet: End-to-end deep plane sweep stereo. In 7th International Conference on Learning Representations, ICLR.
- Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944.
- Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007.
- Planercnn: 3d plane detection and reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Planenet: Piece-wise planar reconstruction from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Planemvs: 3d plane reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8665–8675.
- An intriguing failing of convolutional neural networks and the coordconv solution. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- Learning pairwise inter-plane relations for piecewise planar reconstruction. In European Conference on Computer Vision.
- Which tasks should be learned together in multi-task learning? In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9120–9132. PMLR.
- Planetr: Structure-guided transformers for 3d plane recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4186–4195.
- Solo: Segmenting objects by locations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 649–665. Springer.
- Solov2: Dynamic and fast instance segmentation. Advances in Neural information processing systems, 33:17721–17732.
- Reconstructing piecewise planar scenes with multi-view regularization. Computational Visual Media, 5(4):337–345.
- Planarrecon: Real-time 3d plane detection and reconstruction from posed monocular videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6219–6228.
- Planesegnet: Fast and robust plane estimation using a single-stage instance segmentation cnn. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13574–13580. IEEE.
- Planerecnet: Multi-task learning with cross-task consistency for piece-wise plane detection and reconstruction from a single rgb image. In British Machine Vision Conference.
- Holistically-attracted wireframe parsing: From supervised to self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14727–14744.
- Recovering 3d planes from a single image via convolutional neural networks. In Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y., editors, Computer Vision – ECCV 2018, pages 87–103, Cham. Springer International Publishing.
- Mvsnet: Depth inference for unstructured multi-view stereo. European Conference on Computer Vision (ECCV).
- Single-image piece-wise planar 3d reconstruction via associative embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1029–1037.
- Planckian jitter: countering the color-crippling effects of color jitter on self-supervised training. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.