- The paper introduces Pano2Room, a framework that reconstructs high-quality 3D indoor scenes from a single panorama via novel mesh construction and depth edge filtering.
- It employs iterative mesh completion using a panoramic RGBD inpainter with Stable Diffusion to enhance texture quality and resolve occlusions.
- Experimental results show Pano2Room outperforms methods like PERF and Text2Room in PSNR, SSIM, and LPIPS, underscoring its state-of-the-art performance.
Pano2Room: Novel View Synthesis from a Single Indoor Panorama
Introduction
The paper presents a novel framework termed Pano2Room for reconstructing high-quality 3D indoor scenes from a single panoramic image. The primary objective is to synthesize photo-realistic and geometrically consistent novel views leveraging minimal input information, specifically, a single panorama. This problem is challenging due to the complex nature of real-world environments and significant occlusions commonly found in indoor scenes.
Methodology
The proposed Pano2Room framework primarily hinges on converting an input panorama into a preliminary mesh and iteratively refining this mesh using a panoramic RGBD inpainter. The methodology can be broken down into three principal modules: Pano2Mesh, iterative mesh completion, and Mesh2GS.
Pano2Mesh
The initial step involves constructing a mesh from the input panorama. This process begins with triangulating the pixels in the image space and subsequently projecting them into 3D space using a depth map. A novel depth edge filter enhances this mesh construction by ensuring that faces representing different objects are disconnected based on the depth map’s edge information. This step significantly improves the accuracy of the generated mesh, particularly in separating objects close in proximity without losing key textural details.
Iterative Mesh Completion
The iterative refinement approach addresses the occlusions and enhances the mesh quality. The framework identifies viewpoints with the least view completeness within the scene, generates new textures, and predicts new geometry through a panoramic RGBD inpainter. This inpainter, consisting of a panoramic image inpainter and a panoramic depth inpainter, leverages the strong generative capabilities of Stable Diffusion, fine-tuned for each scene to maintain style consistency and detail quality.
A critical component of the refinement process is the geometry conflict avoidance strategy, which employs mesh rendering to detect and omit conflicting geometry. This technique ensures that newly added mesh does not interfere with pre-existing content, thereby maintaining view consistency and preventing ghost artifacts.
Mesh2GS
The final step converts the refined mesh into a 3D Gaussian Splatting field (3DGS). Training the 3DGS with the collected pseudo novel views ensures the preservation of photo-realism and high-quality depth information. This conversion mitigates the over-smoothing artifacts typically introduced during Poisson surface reconstruction.
Experimental Evaluation
The authors conduct extensive experiments on the Replica dataset and additional real-world captured panoramas to validate the proposed approach. Pano2Room consistently outperforms state-of-the-art methods such as PERF, Text2Room, and LucidDreamer in terms of PSNR, SSIM, and LPIPS metrics. Detailed qualitative and quantitative comparisons highlight the superior ability of Pano2Room in generating high-fidelity novel views with intricate geometric detail and textural consistency.
Implications and Future Directions
The implications of this research are substantial in the domains of Augmented Reality (AR) and Virtual Reality (VR) where immersive and photorealistic 3D reconstructions are paramount. The capability to generate detailed 3D models from minimal input, like a single panorama, opens new possibilities for efficient content creation and scene understanding.
Future developments could involve enhancing the error-correction mechanisms within the iterative refinement process to preemptively address intermediate inaccuracies. Additionally, expanding the framework to handle larger and more complex scenes, such as long corridors or multi-room spaces, would further extend its applicability. The integration of more advanced monocular depth predictors could also refine geometry consistency, especially for reflective and transmissive surfaces which currently pose challenges.
Conclusion
In summary, Pano2Room presents a robust solution for single-panorama indoor novel view synthesis, establishing a new state-of-the-art with its innovative mesh construction, iterative refinement, and final mesh-to-3DGS conversion processes. The extensive evaluations underscore its capabilities in achieving superior photo-realism and geometric accuracy, making it a significant contribution to the field of 3D scene reconstruction.