AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes (2403.01861v1)
Abstract: Indoor scenes we are living in are visually homogenous or textureless, while they inherently have structural forms and provide enough structural priors for 3D scene reconstruction. Motivated by this fact, we propose a structure-aware online signed distance fields (SDF) reconstruction framework in indoor scenes, especially under the Atlanta world (AW) assumption. Thus, we dub this incremental SDF reconstruction for AW as AiSDF. Within the online framework, we infer the underlying Atlanta structure of a given scene and then estimate planar surfel regions supporting the Atlanta structure. This Atlanta-aware surfel representation provides an explicit planar map for a given scene. In addition, based on these Atlanta planar surfel regions, we adaptively sample and constrain the structural regularity in the SDF reconstruction, which enables us to improve the reconstruction quality by maintaining a high-level structure while enhancing the details of a given scene. We evaluate the proposed AiSDF on the ScanNet and ReplicaCAD datasets, where we demonstrate that the proposed framework is capable of reconstructing fine details of objects implicitly, as well as structures explicitly in room-scale scenes.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in CVPR, 2017.
- R. Cabral and Y. Furukawa, “Piecewise planar and compact floorplan reconstruction from images,” in CVPR, 2014.
- T. Schöps, T. Sattler, and M. Pollefeys, “Surfelmeshing: Online surfel-based mesh reconstruction,” IEEE TPAMI, 2019.
- P. Mittal, Y.-C. Cheng, M. Singh, and S. Tulsiani, “AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation,” in CVPR, 2022.
- Y. Jiang, D. Ji, Z. Han, and M. Zwicker, “SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization,” in CVPR, 2020.
- M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” IJRR, 2013.
- E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “iMAP: Implicit mapping and positioning in real-time,” in ICCV, 2021.
- Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-SLAM: Neural implicit scalable encoding for SLAM,” in CVPR, 2022.
- J. Ortiz, A. Clegg, J. Dong, E. Sucar, D. Novotny, M. Zollhoefer, and M. Mukadam, “iSDF: Real-Time Neural Signed Distance Fields for Robot Perception,” in RSS, 2022.
- J. M. Coughlan and A. L. Yuille, “Manhattan world: Compass direction from a single image by bayesian inference,” in ICCV, 1999.
- G. Schindler and F. Dellaert, “Atlanta World: An Expectation Maximization Framework for Simultaneous Low-Level Edge Grouping and Camera Calibration in Complex Man-Made Environments,” in CVPR, 2004.
- S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from rgb-d images,” in CVPR, 2013.
- L. Carlone, R. Tron, K. Daniilidis, and F. Dellaert, “Initialization techniques for 3D SLAM: a survey on rotation estimation and its use in pose graph optimization,” in ICRA, 2015.
- P. Kim, B. Coltin, and H. J. Kim, “Low-drift visual odometry in structured environments by decoupling rotational and transnational motion,” in ICRA, 2018.
- K. Joo, P. Kim, M. Hebert, I. S. Kweon, and H. J. Kim, “Linear RGB-D SLAM for structured environments,” IEEE TPAMI, 2021.
- A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y. Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V. Vondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V. Koltun, J. Malik, M. Savva, and D. Batra, “Habitat 2.0: Training home assistants to rearrange their habitat,” in NeurIPS, 2021.
- H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “Voxblox: Incremental 3d euclidean signed distance fields for on-board mav planning,” in IROS, 2017.
- J. Straub, O. Freifeld, G. Rosman, J. J. Leonard, and J. W. Fisher, “The Manhattan frame model—Manhattan world inference in the space of surface normals,” IEEE TPAMI, 2017.
- K. Joo, T.-H. Oh, I. S. Kweon, and J.-C. Bazin, “Globally optimal inlier set maximization for Atlanta world understanding,” IEEE TPAMI, 2019.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
- W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, “Understanding indoor scenes using 3d geometric phrases,” in CVPR, 2013.
- H. Wildenauer and A. Hanbury, “Robust camera self-calibration from monocular images of manhattan worlds,” in CVPR, 2012.
- L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in CVPR, 2019.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in CVPR, 2019.
- R. Po, Z. Dong, A. W. Bergman, and G. Wetzstein, “Instant continual learning of neural radiance fields,” in ICCVW, 2023.
- Z. Murez, T. v. As, J. Bartolozzi, A. Sinha, V. Badrinarayanan, and A. Rabinovich, “Atlas: End-to-end 3d scene reconstruction from posed images,” in ECCV, 2020.
- J. Sun, Y. Xie, L. Chen, X. Zhou, and H. Bao, “Neuralrecon: Real-time coherent 3d reconstruction from monocular video,” in CVPR, 2021.
- Z. Yan, Y. Tian, X. Shi, P. Guo, P. Wang, and H. Zha, “Continual neural mapping: Learning an implicit scene representation from sequential observations,” in CVPR, 2021.
- A. Dai and M. Nießner, “Neural poisson: Indicator functions for neural fields,” arXiv, 2022.
- L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” in NeurIPS, 2021.
- P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” NeurIPS, 2021.
- D. Azinović, R. Martin-Brualla, D. B. Goldman, M. Nießner, and J. Thies, “Neural rgb-d surface reconstruction,” in CVPR, 2022.
- J. Wang, P. Wang, X. Long, C. Theobalt, T. Komura, L. Liu, and W. Wang, “Neuris: Neural reconstruction of indoor scenes using normal priors,” in ECCV, 2022.
- Z. Yu, S. Peng, M. Niemeyer, T. Sattler, and A. Geiger, “Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction,” NeurIPS, 2022.
- H. Guo, S. Peng, H. Lin, Q. Wang, G. Zhang, H. Bao, and X. Zhou, “Neural 3D Scene Reconstruction with the Manhattan-world Assumption,” in CVPR, 2022.
- K. Joo, T.-H. Oh, J. Kim, and I. S. Kweon, “Robust and globally optimal Manhattan frame estimation in near real time,” IEEE TPAMI, 2018.
- A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman, “Implicit geometric regularization for learning shapes,” in ICML, 2020.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” NeurIPS, 2019.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ICLR, 2018.
- R. J. Griffin, G. Wiedebach, S. McCrory, S. Bertrand, I. Lee, and J. Pratt, “Footstep planning for autonomous walking over rough terrain,” in HUMANOIDS, 2019.
- “Roomplan, Apple ARKit,” https://machinelearning.apple.com/research/roomplan.
- Jaehoon Jang (2 papers)
- Inha Lee (2 papers)
- Minje Kim (53 papers)
- Kyungdon Joo (15 papers)