AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes (2403.01861v1)

Published 4 Mar 2024 in cs.RO, cs.AI, and cs.CV

Abstract: Indoor scenes we are living in are visually homogenous or textureless, while they inherently have structural forms and provide enough structural priors for 3D scene reconstruction. Motivated by this fact, we propose a structure-aware online signed distance fields (SDF) reconstruction framework in indoor scenes, especially under the Atlanta world (AW) assumption. Thus, we dub this incremental SDF reconstruction for AW as AiSDF. Within the online framework, we infer the underlying Atlanta structure of a given scene and then estimate planar surfel regions supporting the Atlanta structure. This Atlanta-aware surfel representation provides an explicit planar map for a given scene. In addition, based on these Atlanta planar surfel regions, we adaptively sample and constrain the structural regularity in the SDF reconstruction, which enables us to improve the reconstruction quality by maintaining a high-level structure while enhancing the details of a given scene. We evaluate the proposed AiSDF on the ScanNet and ReplicaCAD datasets, where we demonstrate that the proposed framework is capable of reconstructing fine details of objects implicitly, as well as structures explicitly in room-scale scenes.

References (42)

Authors (4)

Jaehoon Jang (2 papers)
Inha Lee (2 papers)
Minje Kim (53 papers)
Kyungdon Joo (15 papers)

Summary

AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes

Introducing AiSDF

Recent advances in scene reconstruction have seen a growing interest in leveraging neural implicit representations, most notably through Signed Distance Fields (SDFs). SDFs, with their capacity to provide continuous distance metrics to the nearest surface from any given point in space, have been pivotal in various computer vision and robotics applications. Building upon this concept, we introduce AiSDF, a framework designed to push the envelope of online SDF reconstruction within the structured confines of indoor environments, adhering to what is known as the Atlanta World (AW) assumption.

The Essence of Atlanta World Assumption

The AW assumption forms the cornerstone of our approach. It posits that indoor scenes predominantly consist of structured environments composed of orthogonal or parallel planes. This structural property hints at a commonality in the indoor settings - the presence of a finite number of dominant directions typically seen in the form of walls orthogonal to the floor yet not necessarily orthogonal among themselves. Recognizing and incorporating this assumption allows for a more structured and informed approach towards reconstructing indoor scenes in a manner that respects their inherent architectural paradigms.

AiSDF at a Glance

AiSDF is an iterative framework that processes streams of posed depth images to construct SDFs while honoring the underlying Atlanta structure of scenes. The engine of AiSDF consists of four critical stages:

Estimation of the underlying Atlanta Frame (AF): It involves deducing the vertical and the set of horizontal directions that dominantly represent the scene’s structure.
Extraction of Atlanta-Aware Surfel Representation: Upon establishing the AF, the next step involves extracting planar regions supporting the AW assumption in the form of surfels, effectively providing an explicit planar map of the scene.
Atlanta-Aware Sampling: Here, points are sampled adaptively based on their relation to the surfel regions, allowing for a focused refinement on complex areas while maintaining structural regularity.
Iterative Learning and Refinement: Utilizing a structure-aware approach, the network is continuously updated to refine the SDF representation, harnessing both implicit SDF values and explicit planar maps to enhance reconstruction quality.

Performance Characterization

AiSDF surpasses contemporary methods such as Voxblox and iSDF in rendering highly detailed and structurally consistent reconstructions of indoor scenes, as evidenced by evaluations on the ScanNet and ReplicaCAD datasets. The framework’s innovative use of Atlanta-aware surfel sampling and surfel-based loss computation contributes significantly to its ability to preserve intricate details and overall structural integrity of the scene. Additionally, AiSDF can generate explicit 3D planar maps alongside the neural implicit maps, which are not only memory-efficient but can potentially serve a myriad of downstream applications in navigation, planning, and interaction within robotics and augmented reality domains.

Looking Forward

While AiSDF marks a significant step forward in the online reconstruction of indoor scenes, it also opens several avenues for future exploration. The current iteration independently processes keyframes for surfel extraction, indicating an area for improvement in generating a unified and complete explicit planar representation of scenes. Moreover, the full potential of encoding Atlanta-aware surfels directly into the neural SDF representation remains untapped, posing an exciting direction for further research.

In conclusion, AiSDF solidifies the importance of incorporating structural awareness in neural scene reconstruction, especially in indoor environments. By synergizing the strengths of neural implicit functions with the structured nous provided by the Atlanta World assumption, AiSDF not only elevates the fidelity of scene reconstructions but also paves the way for nuanced understandings and interactions within structured environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1766751218120937726

https://twitter.com/knishimae0531/status/1766988239968297333