Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 36 tok/s Pro

GPT-4o 110 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 469 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh (2506.05338v2)

Published 5 Jun 2025 in cs.CV

Abstract: We present a pipeline for generating defurnished replicas of indoor spaces represented as textured meshes and corresponding multi-view panoramic images. To achieve this, we first segment and remove furniture from the mesh representation, extend planes, and fill holes, obtaining a simplified defurnished mesh (SDM). This SDM acts as an ``X-ray'' of the scene's underlying structure, guiding the defurnishing process. We extract Canny edges from depth and normal images rendered from the SDM. We then use these as a guide to remove the furniture from panorama images via ControlNet inpainting. This control signal ensures the availability of global geometric information that may be hidden from a particular panoramic view by the furniture being removed. The inpainted panoramas are used to texture the mesh. We show that our approach produces higher quality assets than methods that rely on neural radiance fields, which tend to produce blurry low-resolution images, or RGB-D inpainting, which is highly susceptible to hallucinations.

Summary

The paper introduces a pipeline using a Simplified Defurnished Mesh as a virtual 'X-ray' to precisely remove furniture from indoor scenes.
It combines geometric priors and Canny edge guidance to ensure multi-view consistency and artifact-free reconstruction on datasets like Matterport3D and ScanNet.
The method outperforms radiance field-based models by reducing processing time while delivering superior precision in 3D scene defurnishing.

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh

This academic paper presents a comprehensive pipeline for defurnishing indoor spaces, focusing on both panoramic images and their corresponding 3D mesh representations. The primary objective is to effectively remove furniture and clutter, enabling applications such as virtual staging and digital twin creation. The methodology leverages geometric priors through a Simplified Defurnished Mesh (SDM), combined with ControlNet-based inpainting, to deliver higher-quality results compared to existing techniques in object removal from 3D scenes.

The authors introduce their method with an SDM serving as a virtual "X-ray" that reveals the scene's underlying structure. This approach stands in contrast to other methods based on neural radiance fields or RGB-D inpainting, which typically encounter challenges like blurring or hallucinations when dealing with heavily cluttered scenes. The SDM process involves segmenting and removing furniture from the mesh representation, extending planes, and filling holes to produce a simplified mesh. Canny edges extracted from depth and normal images guide the inpainting process, ensuring multi-view consistency.

Experimentally, the paper demonstrates its pipeline on datasets like Matterport3D and ScanNet, acknowledging the practical benefits of this approach over radiance field-based models. Such models, though inherently suited for creating 3D scene reconstructions, struggle with artifacts and inconsistent outputs when objects are removed. The proposed SDM method exhibits superior precision in geometric reconstruction, making it well-suited for scenarios demanding accurate output geometry.

The research highlights several advantages:

Consistency and Artifact-Free Results: The use of SDM and ControlNet ensures robust furniture removal and hole-filling, particularly in cluttered environments.
Faster Processing Times: The pipeline offers expedited results compared to traditional 3D-based inpainting techniques.
Adaptability to Diverse Scenes: By grounding on geometric priors instead of semantic segmentation, diverse scene types are handled more adeptly.

While showcasing promising results, the paper acknowledges some limitations:

Hallucinations and Spurious Modifications: The paper discusses the inherent challenges faced by current SD models in mitigating object hallucinations.
Multi-View Consistency: Structures not visible due to occlusions can lead to inconsistencies across different views, although the geometric information provided helps mitigate this issue.

In conclusion, the authors make a compelling case for a novel method of defurnishing that relies on geometric priors rather than strictly image-based techniques. The implications of this work extend beyond traditional real estate applications, offering enhanced capabilities in rendering and facility management, thereby marking significant advancements in 3D scene understanding. Future research might explore improved strategies for image consistency across multiple views or harnessing attention mechanisms within existing deep learning architectures to further refine outputs in complex environments.