Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery (2403.11812v1)

Published 18 Mar 2024 in cs.CV

Abstract: We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D. This is a challenging problem due to two primary reasons. Firstly, objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads, which pose a significant challenge for accurate 2D segmentation. Secondly, the 2D labels generated by existing segmentation methods suffer from the multi-view inconsistency problem, especially in the case of aerial images, where each image captures only a small portion of the entire scene. To overcome these limitations, we first introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes by combining labels predicted from different altitudes, harnessing the novel-view synthesis capabilities of NeRF. We then introduce a novel cross-view instance label grouping strategy based on the 3D scene representation to mitigate the multi-view inconsistency problem in the 2D instance labels. Furthermore, we exploit multi-view reconstructed depth priors to improve the geometric quality of the reconstructed radiance field, resulting in enhanced segmentation results. Experiments on multiple real-world urban-scale datasets demonstrate that our approach outperforms existing methods, highlighting its effectiveness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (103)
  1. Deep learning-based semantic segmentation of urban-scale 3D meshes in remote sensing: A survey. International Journal of Applied Earth Observation and Geoinformation, 2023.
  2. Building Rome in a day. Communications of the ACM, 2011.
  3. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975.
  4. Contrastive lift: 3d object instance segmentation by slow-fast contrastive fusion. arXiv:2306.04633, 2023.
  5. DM-NeRF: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprint arXiv:2208.07227, 2022.
  6. Large-scale semantic 3d reconstruction: an adaptive multi-resolution model for multi-class volumetric labeling. In CVPR, 2016.
  7. Neural implicit vision-language feature fields. arXiv preprint arXiv:2303.10962, 2023.
  8. Segment anything in 3d with nerfs. NeurIPS, 36, 2024.
  9. Tensorf: Tensorial radiance fields. In ECCV, pages 333–350. Springer, 2022.
  10. 3-D instance segmentation of MVS buildings. IEEE Transactions on Geoscience and Remote Sensing, 2022.
  11. STPLS3D: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset. arXiv preprint arXiv:2203.09065, 2022.
  12. Interactive segment anything NeRF with feature imitation. arXiv preprint arXiv:2305.16233, 2023.
  13. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  14. Per-pixel classification is not all you need for semantic segmentation. 2021.
  15. Panoptic compositional feature field for editable scene rendering with network-inferred labels via metric learning. In CVPR, 2023.
  16. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR, 2019.
  17. The cityscapes dataset for semantic urban scene understanding. In CVPR, pages 3213–3223, 2016.
  18. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
  19. Depth-supervised NeRF: Fewer views and faster training for free. In CVPR, 2022.
  20. Plenoxels: Radiance fields without neural networks. In CVPR, pages 5501–5510, 2022.
  21. An automated method for large-scale, ground-based city model acquisition. IJCV, pages 5–24, 2004.
  22. Geo-NeuS: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction. NeurIPS, 2022.
  23. Panoptic nerf: 3D-to-2D label transfer for panoptic urban scene segmentation. In 3DV, 2022.
  24. Towards internet-scale multi-view stereo. In CVPR, 2010.
  25. Accurate, dense, and robust multi-view stereopsis. TPAMI, 2010.
  26. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
  27. 3D semantic segmentation with submanifold sparse convolutional networks. In CVPR, 2018.
  28. StreetSurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988, 2023.
  29. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  30. Lidar-based panoptic segmentation via dynamic shifting network. In CVPR, 2021.
  31. 3d concept learning and reasoning from multi-view images. In CVPR, 2023.
  32. Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges. In CVPR, 2021.
  33. Sensaturban: Learning semantics from urban-scale photogrammetric point clouds. International Journal of Computer Vision, 130(2):316–343, 2022.
  34. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In CVPR, 2020.
  35. MOPT: Multi-object panoptic tracking. arXiv preprint arXiv:2004.08189, 2020.
  36. 3d gaussian splatting for real-time radiance field rendering. TOG, 2023.
  37. Lerf: Language embedded radiance fields. In ICCV, pages 19729–19739, 2023.
  38. ADAM: A method for stochastic optimization. In ICLR, 2015.
  39. Segment anything. arXiv:2304.02643, 2023.
  40. Decomposing NeRF for editing via feature field distillation. NeurIPS, 2022.
  41. Panoptic neural fields: A semantic object-aware neural scene representation. In CVPR, 2022.
  42. Stratified transformer for 3D point cloud segmentation. In CVPR, 2022.
  43. Russell Land. detectron2-spacenet. https://github.com/rcland12/detectron2-spacenet, 2023.
  44. Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV, pages 427–440, 2008.
  45. MatrixCity: A large-scale city dataset for city-scale neural rendering and beyond. In ICCV, 2023.
  46. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. TPAMI, 2022.
  47. Capturing, reconstructing, and simulating: the UrbanScene3D dataset. In ECCV, 2022.
  48. Microsoft coco: Common objects in context. In ECCV, 2014.
  49. Instance neural radiance field. In ICCV, pages 787–796, 2023.
  50. A large-scale outdoor multi-modal dataset and benchmark for novel view synthesis and implicit scene reconstruction. In ICCV, 2023.
  51. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 2020.
  52. Beyond single receptive field: A receptive field fusion-and-stratification network for airborne laser scanning point cloud classification. ISPRS Journal of Photogrammetry and Remote Sensing, 188:45–61, 2022.
  53. Sat-nerf: Learning multi-view satellite photogrammetry with transient objects and shadow modeling using rpc cameras. In CVPR, pages 1311–1321, 2022.
  54. Diffuser: Multi-view 2D-to-3D label diffusion for semantic scene segmentation. In ICRA, 2021.
  55. HDBSCAN: Hierarchical density based clustering. J. Open Source Softw., 2017.
  56. Switch-nerf: Learning scene decomposition with mixture of experts for large-scale neural radiance fields. In ICLR, 2022.
  57. NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV, pages 405–421, 2020.
  58. Instant neural graphics primitives with a multiresolution hash encoding. TOG, 41(4):1–15, 2022.
  59. Modeling urban scenes from pointclouds. In ICCV, 2017.
  60. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, pages 3504–3515, 2020.
  61. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In ICCV, pages 5589–5599, 2021.
  62. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR, pages 165–174, 2019.
  63. PyTorch: Tensors and dynamic neural networks in Python with strong GPU acceleration, 2017.
  64. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pages 652–660, 2017.
  65. PointNet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS, 2017.
  66. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 44(3):1623–1637, 2020.
  67. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In CVPR, 2021.
  68. Structure-from-motion revisited. In CVPR, pages 4104–4113, 2016.
  69. Pixelwise view selection for unstructured multi-view stereo. In ECCV, pages 501–518, 2016.
  70. Review of image-based rendering techniques. In Visual Communications and Image Processing 2000, 2000.
  71. Panoptic lifting for 3d scene understanding with neural fields. In CVPR, 2023.
  72. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH, 2006.
  73. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  74. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, pages 5459–5469, 2022.
  75. OpenMask3D: Open-vocabulary 3d instance segmentation. arXiv preprint arXiv:2306.13631, 2023.
  76. Block-nerf: Scalable large scene neural view synthesis. In CVPR, pages 8248–8258, 2022.
  77. Jiaxiang Tang. Torch-ngp: a pytorch implementation of instant-ngp. https://github.com/ashawkey/torch-ngp, 2022.
  78. Compressible-composable nerf via rank-residual decomposition. arXiv preprint arXiv:2205.14870, 2022.
  79. State of the art on neural rendering. In Computer Graphics Forum, 2020.
  80. Advances in neural rendering. arXiv preprint arXiv:2111.05849, 2021.
  81. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In CVPR, pages 12922–12931, 2022.
  82. Spacenet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232, 2018.
  83. NeSF: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv:2111.13260, 2021.
  84. Softgroup for 3d instance segmentation on point clouds. In CVPR, 2022.
  85. Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 2022.
  86. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, volume 34, 2021.
  87. Scalable neural indoor scene rendering. TOG, 2022.
  88. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  89. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In ECCV, 2022.
  90. Neural fields in visual computing and beyond. CGF, 41(2):641–676, 2022.
  91. Grid-guided neural radiance fields for large urban scenes. In CVPR, pages 8296–8306, 2023.
  92. Learning object bounding boxes for 3d instance segmentation on point clouds. NeurIPS, 2019.
  93. UrbanBIS: a large-scale benchmark for fine-grained urban building instance segmentation. In SIGGRAPH, 2023.
  94. Volume rendering of neural implicit surfaces. NeurIPS, 34:4805–4815, 2021.
  95. Multiview neural surface reconstruction by disentangling geometry and appearance. In NeurIPS, volume 33, pages 2492–2502, 2020.
  96. ISAT with segment anything: Image segmentation annotation tool with segment anything. https://github.com/yatengLG/ISAT_with_segment_anything, 2023.
  97. GSPN: Generative shape proposal network for 3d instance segmentation in point cloud. In CVPR, 2019.
  98. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. NeurIPS, 2022.
  99. Nerfusion: Fusing radiance fields for large-scale scene reconstruction. In CVPR, pages 5449–5458, 2022.
  100. Efficient large-scale scene representation with a hybrid of high-resolution grid and plane features. arXiv preprint arXiv:2303.03003, 2023.
  101. In-place scene labelling and understanding with implicit scene representation. In ICCV, 2021.
  102. Scene parsing through ade20k dataset. In CVPR, 2017.
  103. Very large-scale global sfm by distributed motion averaging. In CVPR, pages 4568–4577, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuqi Zhang (54 papers)
  2. Guanying Chen (32 papers)
  3. Jiaxing Chen (9 papers)
  4. Shuguang Cui (275 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.