Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lightweight integration of 3D features to improve 2D image segmentation (2212.08334v2)

Published 16 Dec 2022 in cs.CV

Abstract: Scene understanding has made tremendous progress over the past few years, as data acquisition systems are now providing an increasing amount of data of various modalities (point cloud, depth, RGB...). However, this improvement comes at a large cost on computation resources and data annotation requirements. To analyze geometric information and images jointly, many approaches rely on both a 2D loss and 3D loss, requiring not only 2D per pixel-labels but also 3D per-point labels. However, obtaining a 3D groundtruth is challenging, time-consuming and error-prone. In this paper, we show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth, by training the geometric feature extraction and the 2D segmentation network jointly, in an end-to-end fashion, using only the 2D segmentation loss. Our method starts by extracting a map of 3D features directly from a provided point cloud by using a lightweight 3D neural network. The 3D feature map, merged with the RGB image, is then used as an input to a classical image segmentation network. Our method can be applied to many 2D segmentation networks, improving significantly their performance with only a marginal network weight increase and light input dataset requirements, since no 3D groundtruth is required.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018,.
  2. Bidirectional projection network for cross dimensional scene understanding. In: CVPR. 2021,.
  3. Indoor segmentation and support inference from rgbd images. In: ECCV. 2012,.
  4. Scannet: Richly-annotated 3d reconstructions of indoor scenes. 2017a.
  5. Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints 2017;arXiv:1702.01105.
  6. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. arXiv preprint arXiv:210913410 2021;.
  7. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015,.
  8. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12; Red Hook, NY, USA: Curran Associates Inc.; 2012, p. 1097–1105.
  9. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); vol. 9351 of LNCS. Springer; 2015, p. 234--241. (available on arXiv:1505.04597 [cs.CV]).
  10. Going deeper with convolutions. CoRR 2014;abs/1409.4842. arXiv:1409.4842.
  11. Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations. 2016,.
  12. Rethinking atrous convolution for semantic image segmentation. CoRR 2017;abs/1706.05587. arXiv:1706.05587.
  13. Deformable convolutional networks. CoRR 2017b;abs/1703.06211. arXiv:1703.06211.
  14. Segformer: Simple and efficient design for semantic segmentation with transformers. In: Neural Information Processing Systems (NeurIPS). 2021,.
  15. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021a,.
  16. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, p. 3075--3084.
  17. Point-voxel cnn for efficient 3d deep learning. 2019.
  18. Dynamic graph cnn for learning on point clouds. 2018.
  19. Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE International Conference on Computer Vision 2019;.
  20. Pointnet: Deep learning on point sets for 3d classification and segmentation. CoRR 2016;abs/1612.00593. arXiv:1612.00593.
  21. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. 2017.
  22. Pointconv: Deep convolutional networks on 3d point clouds. 2018.
  23. Pointwise convolutional neural networks. 2017.
  24. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers and Graphics 2017;71:189--198. doi:10.1016/j.cag.2017.11.010.
  25. Virtual multi-view fusion for 3d semantic segmentation. 2020.
  26. Learning rich features from RGB-D images for object detection and segmentation. In: ECCV. 2014,.
  27. Depth-aware cnn for rgb-d segmentation. In: ECCV. 2018,.
  28. Shapeconv: Shape-aware convolutional layer for indoor rgb-d semantic segmentation. 2021.
  29. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. 2022.
  30. 3d-to-2d distillation for indoor scene parsing. 2021b.
  31. Multi-view pointnet for 3d scene understanding. In: ICCV Workshop 2019. 2019,.
  32. SPLATNet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, p. 2530--2539.
  33. Real-time Rendering of Massive Unstructured Raw Point Clouds using Screen-space Operators. In: Niccolucci, F, Dellepiane, M, Serna, SP, Rushmeier, H, Gool, LV, editors. VAST: International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage. The Eurographics Association. ISBN 978-3-905674-34-7; 2011,doi:10.2312/VAST/VAST11/105-112.
  34. Deep residual learning for image recognition. 2015.
  35. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 2015;115(3):211--252. doi:10.1007/s11263-015-0816-y.
  36. Rfbnet: Deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation. 2019.
  37. Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision 2019;128(5):1239--1285. doi:10.1007/s11263-019-01188-y.
  38. Encoder-decoder with atrous separable convolution for semantic image segmentation. 2018. arXiv:1802.02611.
  39. Cook, RL. Stochastic sampling in computer graphics. ACM Trans Graph 1986;.
  40. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision 2019;127(3):302--321.
  41. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017,.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Olivier Pradelle (1 paper)
  2. David Wendland (1 paper)
  3. Julie Digne (12 papers)
  4. Raphaelle Chaine (1 paper)
Citations (2)

Summary

We haven't generated a summary for this paper yet.