Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes (2403.01861v1)

Published 4 Mar 2024 in cs.RO, cs.AI, and cs.CV

Abstract: Indoor scenes we are living in are visually homogenous or textureless, while they inherently have structural forms and provide enough structural priors for 3D scene reconstruction. Motivated by this fact, we propose a structure-aware online signed distance fields (SDF) reconstruction framework in indoor scenes, especially under the Atlanta world (AW) assumption. Thus, we dub this incremental SDF reconstruction for AW as AiSDF. Within the online framework, we infer the underlying Atlanta structure of a given scene and then estimate planar surfel regions supporting the Atlanta structure. This Atlanta-aware surfel representation provides an explicit planar map for a given scene. In addition, based on these Atlanta planar surfel regions, we adaptively sample and constrain the structural regularity in the SDF reconstruction, which enables us to improve the reconstruction quality by maintaining a high-level structure while enhancing the details of a given scene. We evaluate the proposed AiSDF on the ScanNet and ReplicaCAD datasets, where we demonstrate that the proposed framework is capable of reconstructing fine details of objects implicitly, as well as structures explicitly in room-scale scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in CVPR, 2017.
  2. R. Cabral and Y. Furukawa, “Piecewise planar and compact floorplan reconstruction from images,” in CVPR, 2014.
  3. T. Schöps, T. Sattler, and M. Pollefeys, “Surfelmeshing: Online surfel-based mesh reconstruction,” IEEE TPAMI, 2019.
  4. P. Mittal, Y.-C. Cheng, M. Singh, and S. Tulsiani, “AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation,” in CVPR, 2022.
  5. Y. Jiang, D. Ji, Z. Han, and M. Zwicker, “SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization,” in CVPR, 2020.
  6. M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” IJRR, 2013.
  7. E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “iMAP: Implicit mapping and positioning in real-time,” in ICCV, 2021.
  8. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-SLAM: Neural implicit scalable encoding for SLAM,” in CVPR, 2022.
  9. J. Ortiz, A. Clegg, J. Dong, E. Sucar, D. Novotny, M. Zollhoefer, and M. Mukadam, “iSDF: Real-Time Neural Signed Distance Fields for Robot Perception,” in RSS, 2022.
  10. J. M. Coughlan and A. L. Yuille, “Manhattan world: Compass direction from a single image by bayesian inference,” in ICCV, 1999.
  11. G. Schindler and F. Dellaert, “Atlanta World: An Expectation Maximization Framework for Simultaneous Low-Level Edge Grouping and Camera Calibration in Complex Man-Made Environments,” in CVPR, 2004.
  12. S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from rgb-d images,” in CVPR, 2013.
  13. L. Carlone, R. Tron, K. Daniilidis, and F. Dellaert, “Initialization techniques for 3D SLAM: a survey on rotation estimation and its use in pose graph optimization,” in ICRA, 2015.
  14. P. Kim, B. Coltin, and H. J. Kim, “Low-drift visual odometry in structured environments by decoupling rotational and transnational motion,” in ICRA, 2018.
  15. K. Joo, P. Kim, M. Hebert, I. S. Kweon, and H. J. Kim, “Linear RGB-D SLAM for structured environments,” IEEE TPAMI, 2021.
  16. A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y. Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V. Vondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V. Koltun, J. Malik, M. Savva, and D. Batra, “Habitat 2.0: Training home assistants to rearrange their habitat,” in NeurIPS, 2021.
  17. H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “Voxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,” in IROS, 2017.
  18. J. Straub, O. Freifeld, G. Rosman, J. J. Leonard, and J. W. Fisher, “The Manhattan frame model—Manhattan world inference in the space of surface normals,” IEEE TPAMI, 2017.
  19. K. Joo, T.-H. Oh, I. S. Kweon, and J.-C. Bazin, “Globally optimal inlier set maximization for Atlanta world understanding,” IEEE TPAMI, 2019.
  20. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
  21. W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, “Understanding indoor scenes using 3d geometric phrases,” in CVPR, 2013.
  22. H. Wildenauer and A. Hanbury, “Robust camera self-calibration from monocular images of manhattan worlds,” in CVPR, 2012.
  23. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in CVPR, 2019.
  24. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
  25. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in CVPR, 2019.
  26. R. Po, Z. Dong, A. W. Bergman, and G. Wetzstein, “Instant continual learning of neural radiance fields,” in ICCVW, 2023.
  27. Z. Murez, T. v. As, J. Bartolozzi, A. Sinha, V. Badrinarayanan, and A. Rabinovich, “Atlas: End-to-end 3d scene reconstruction from posed images,” in ECCV, 2020.
  28. J. Sun, Y. Xie, L. Chen, X. Zhou, and H. Bao, “Neuralrecon: Real-time coherent 3d reconstruction from monocular video,” in CVPR, 2021.
  29. Z. Yan, Y. Tian, X. Shi, P. Guo, P. Wang, and H. Zha, “Continual neural mapping: Learning an implicit scene representation from sequential observations,” in CVPR, 2021.
  30. A. Dai and M. Nießner, “Neural poisson: Indicator functions for neural fields,” arXiv, 2022.
  31. L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” in NeurIPS, 2021.
  32. P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” NeurIPS, 2021.
  33. D. Azinović, R. Martin-Brualla, D. B. Goldman, M. Nießner, and J. Thies, “Neural rgb-d surface reconstruction,” in CVPR, 2022.
  34. J. Wang, P. Wang, X. Long, C. Theobalt, T. Komura, L. Liu, and W. Wang, “Neuris: Neural reconstruction of indoor scenes using normal priors,” in ECCV, 2022.
  35. Z. Yu, S. Peng, M. Niemeyer, T. Sattler, and A. Geiger, “Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction,” NeurIPS, 2022.
  36. H. Guo, S. Peng, H. Lin, Q. Wang, G. Zhang, H. Bao, and X. Zhou, “Neural 3D Scene Reconstruction with the Manhattan-world Assumption,” in CVPR, 2022.
  37. K. Joo, T.-H. Oh, J. Kim, and I. S. Kweon, “Robust and globally optimal Manhattan frame estimation in near real time,” IEEE TPAMI, 2018.
  38. A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman, “Implicit geometric regularization for learning shapes,” in ICML, 2020.
  39. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” NeurIPS, 2019.
  40. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ICLR, 2018.
  41. R. J. Griffin, G. Wiedebach, S. McCrory, S. Bertrand, I. Lee, and J. Pratt, “Footstep planning for autonomous walking over rough terrain,” in HUMANOIDS, 2019.
  42. “Roomplan, Apple ARKit,” https://machinelearning.apple.com/research/roomplan.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jaehoon Jang (2 papers)
  2. Inha Lee (2 papers)
  3. Minje Kim (53 papers)
  4. Kyungdon Joo (15 papers)

Summary

AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes

Introducing AiSDF

Recent advances in scene reconstruction have seen a growing interest in leveraging neural implicit representations, most notably through Signed Distance Fields (SDFs). SDFs, with their capacity to provide continuous distance metrics to the nearest surface from any given point in space, have been pivotal in various computer vision and robotics applications. Building upon this concept, we introduce AiSDF, a framework designed to push the envelope of online SDF reconstruction within the structured confines of indoor environments, adhering to what is known as the Atlanta World (AW) assumption.

The Essence of Atlanta World Assumption

The AW assumption forms the cornerstone of our approach. It posits that indoor scenes predominantly consist of structured environments composed of orthogonal or parallel planes. This structural property hints at a commonality in the indoor settings - the presence of a finite number of dominant directions typically seen in the form of walls orthogonal to the floor yet not necessarily orthogonal among themselves. Recognizing and incorporating this assumption allows for a more structured and informed approach towards reconstructing indoor scenes in a manner that respects their inherent architectural paradigms.

AiSDF at a Glance

AiSDF is an iterative framework that processes streams of posed depth images to construct SDFs while honoring the underlying Atlanta structure of scenes. The engine of AiSDF consists of four critical stages:

  1. Estimation of the underlying Atlanta Frame (AF): It involves deducing the vertical and the set of horizontal directions that dominantly represent the scene’s structure.
  2. Extraction of Atlanta-Aware Surfel Representation: Upon establishing the AF, the next step involves extracting planar regions supporting the AW assumption in the form of surfels, effectively providing an explicit planar map of the scene.
  3. Atlanta-Aware Sampling: Here, points are sampled adaptively based on their relation to the surfel regions, allowing for a focused refinement on complex areas while maintaining structural regularity.
  4. Iterative Learning and Refinement: Utilizing a structure-aware approach, the network is continuously updated to refine the SDF representation, harnessing both implicit SDF values and explicit planar maps to enhance reconstruction quality.

Performance Characterization

AiSDF surpasses contemporary methods such as Voxblox and iSDF in rendering highly detailed and structurally consistent reconstructions of indoor scenes, as evidenced by evaluations on the ScanNet and ReplicaCAD datasets. The framework’s innovative use of Atlanta-aware surfel sampling and surfel-based loss computation contributes significantly to its ability to preserve intricate details and overall structural integrity of the scene. Additionally, AiSDF can generate explicit 3D planar maps alongside the neural implicit maps, which are not only memory-efficient but can potentially serve a myriad of downstream applications in navigation, planning, and interaction within robotics and augmented reality domains.

Looking Forward

While AiSDF marks a significant step forward in the online reconstruction of indoor scenes, it also opens several avenues for future exploration. The current iteration independently processes keyframes for surfel extraction, indicating an area for improvement in generating a unified and complete explicit planar representation of scenes. Moreover, the full potential of encoding Atlanta-aware surfels directly into the neural SDF representation remains untapped, posing an exciting direction for further research.

In conclusion, AiSDF solidifies the importance of incorporating structural awareness in neural scene reconstruction, especially in indoor environments. By synergizing the strengths of neural implicit functions with the structured nous provided by the Atlanta World assumption, AiSDF not only elevates the fidelity of scene reconstructions but also paves the way for nuanced understandings and interactions within structured environments.