3D-consistent generative object insertion in scenes

Develop methods to generate and insert new objects into 3D scenes in a multiview-consistent manner, enabling arbitrary placement beyond cases constrained by strong spatial priors (such as placing a hat on a head or a mustache on a face).

Background

Recent 3D scene editing approaches often rely on applying 2D diffusion-based edits across multiple views of a scene’s neural radiance field (NeRF). While effective for global style and appearance modifications, these methods struggle to create and place new objects consistently across viewpoints, largely due to multiview inconsistency and limited spatial control.

Existing works primarily tackle object removal or inpainting, or require explicit 3D inputs (e.g., 3D boxes, multi-view masks). As a result, generating and inserting new objects in complex scenes—without strong spatial priors and with consistent geometry and appearance across views—remains insufficiently addressed, motivating the need for methods that resolve these limitations.

References

However, generating and inserting new objects in scenes in a 3D-consistent way remains an open problem and is mainly limited to cases where edits are strongly constrained by spatial priors (e.g. putting a hat on a head or a mustache on a face).

— InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes (2401.05335 - Shahbazi et al., 10 Jan 2024) in Section 1: Introduction

3D-consistent generative object insertion in scenes

Background

References

Related Problems