Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects (2401.05236v1)

Published 10 Jan 2024 in cs.CV

Abstract: Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.

References (77)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel inverse graphics framework that exploits duplicate objects in a single image to reconstruct 3D structure.
It employs a rotation-robust pose estimation module combined with geometric reconstruction to align multiple instances as virtual multi-view observations.
Empirical results on synthetic and real-world data demonstrate superior reconstruction accuracy compared to traditional multi-view methods.

Overview of Structure from Duplicates

The field of computer vision has long grappled with the challenge of inverse rendering, the process of deducing an object's 3D structure, material properties, and lighting from images. This task is particularly difficult when only a single image is available. A new inverse graphics framework named Structure from Duplicates (SfD) addresses this challenge using an innovative approach: exploiting the presence of identical objects within a single image.

The Key Insight and Methodology

SfD operates on the principle that identical objects seen from different angles offer rich cues akin to viewing a single object from multiple perspectives. The process begins with identifying such objects in an image. The framework then estimates each object's pose using a rotation-robust pose estimation method, setting the stage for 3D reconstruction.

Central to SfD's ability to unravel the scene's complexity is the use of two modules: one for robust pose estimation even when the objects are rotated within the plane, and a geometric reconstruction module that interprets the object's details. This dual-module approach aligns multiple instances of an object as if viewed by different virtual cameras, thereby translating a single-view multi-object scenario into a multi-view single-object challenge.

Innovations and Advantages

This research stands out by using object duplicates from a single image as a powerful prior to aid inverse rendering. The process generates highly realistic 3D reconstructions which are superior to traditional models, even when these models are provided with greater data through multi-view observations. Notably, the improved accuracy of SfD is not confined to synthetic images; it also extends to real-world examples.

Empirical Evidence and Applications

The effectiveness of SfD has been validated on a new dataset consisting of synthetic and real-world images featuring duplicated objects. The model exhibits an impressive ability to infer details that rival or exceed those produced by existing frameworks requiring more extensive multi-view inputs. These capabilities open up possibilities for relighting, material editing, and seamlessly integrating new objects into previously captured scenes.

Future Horizons and Challenges

Despite its strengths, SfD is currently designed for scenarios with nearly identical objects and relies on accurate instance segmentation masks. Improved performance could be investigated by incorporating methods that handle minor variations among object instances and refine pose estimations. Future work could also focus on extending the framework to handle the geometry of unseen regions, a common limitation within neural field methodologies.

In conclusion, Structure from Duplicates paves the way for significant advancements in single-image inverse rendering, with the potential to transform applications across computer vision, graphics, and robotics.