Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects (2401.05236v1)
Abstract: Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.
- Shape, illumination, and reflectance from shading. TPAMI, 2014.
- Recovering intrinsic scene characteristics. Comput. vis. syst, 1978.
- Brent Burley and Walt Disney Animation Studios. Physically-based shading at disney. In Siggraph, 2012.
- Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
- Single-image depth perception in the wild. NeurIPS, 2016.
- Dib-r++: learning to predict lighting and material with a hybrid differentiable renderer. NeurIPS, 2021.
- Mask2former for video instance segmentation. arXiv, 2021.
- Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 465–474. 2023.
- Appearance-from-motion: Recovering spatially varying surface reflectance under unknown lighting. ACM Transactions on Graphics (TOG), 33(6):1–12, 2014.
- Recovering 6d object pose and predicting next-best-view in the crowd. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3583–3592, 2016.
- Texture synthesis by non-parametric sampling. In ICCV, 1999.
- Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
- Depth map prediction from a single image using a multi-scale deep network. NeurIPS, 2014.
- Image denoising via sparse and redundant representations over learned dictionaries. TIP, 2006.
- Reconstructing 3d human pose by watching humans in the mirror. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12814–12823, 2021.
- William T Freeman. The generic viewpoint assumption in a framework for visual perception. Nature, 1994.
- Super-resolution from a single image. In ICCV, 2009.
- Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV, 2009.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. arXiv:2206.03380, 2022.
- Photometric ambient occlusion. In CVPR, 2013.
- Berthold KP Horn. Determining lightness from an image. Computer graphics and image processing, 1974.
- Berthold KP Horn. Obtaining shape from shading information. The psychology of computer vision, 1975.
- Multiple-view 3-d reconstruction using a mirror. 2005.
- Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
- Dex-nerf: Using a neural radiance field to grasp transparent objects. arXiv, 2021.
- James T Kajiya. The rendering equation. In Proceedings of the 13th annual conference on Computer graphics and interactive techniques, 1986.
- Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice, 2013.
- Rich intrinsic image decomposition of outdoor scenes from multiple views. IEEE transactions on visualization and computer graphics, 2012.
- Lightness and retinex theory. Josa, 1971.
- Image-based reconstruction of spatial appearance and geometric detail. TOG, 2003.
- Multi-plane program induction with 3d box priors. NeurIPS, 2020.
- Perspective plane program induction from a single image. In CVPR, 2020.
- Neuralangelo: High-fidelity neural surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Shape and material capture at home. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6123–6133, 2021.
- Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- Mira: Mental imagery for robotic affordances. In CoRL, 2023.
- Editing conditional radiance fields. In ICCV, 2021.
- HÂ Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 1981.
- Single image intrinsic decomposition without a single intrinsic image. In ECCV, 2018.
- Exploiting sparse semantic hd maps for self-driving vehicle localization. In IROS, 2019.
- Stephen Robert Marschner. Inverse rendering for computer graphics. Cornell University, 1998.
- Blind deblurring using internal patch recurrence. In ECCV, 2014.
- Nerf: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020.
- Scene-level pose estimation for multiple instances of densely packed objects. In Conference on Robot Learning, pages 1133–1145. PMLR, 2020.
- Extracting triangular 3d models, materials, and lighting from images. In CVPR, 2022.
- Camp: Camera preconditioning for neural radiance fields. ACM Trans. Graph., 2023.
- Vision transformers for dense prediction. In ICCV, 2021.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Illumination from shadows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003.
- Object shape and reflectance modeling from observation. In SIGGRAPH, 1997.
- Structure-from-motion revisited. In CVPR, 2016.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Implicit neural representations with periodic activation functions. In Proc. NeurIPS, 2020.
- Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7495–7504, 2021.
- Sparf: Neural radiance fields from sparse and noisy poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4190–4200, 2023.
- Cadsim: Robust and scalable in-the-wild 3d reconstruction for controllable sensor simulation. In 6th Annual Conference on Robot Learning.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Pet-neus: Positional encoding triplanes for neural surfaces. 2023.
- Neural fields meet explicit geometric representation for inverse rendering of urban scenes. arXiv, 2023.
- Geometry processing with neural fields. NeurIPS, 2021.
- Unisim: A neural closed-loop sensor simulator. CVPR, 2023.
- Multiview neural surface reconstruction by disentangling geometry and appearance. NeurIPS, 2020.
- Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999.
- Peanut: Predicting and navigating to unseen targets. arXiv, 2022.
- Iron: Inverse rendering by optimizing neural sdfs and materials from photometric images. In CVPR, 2022.
- Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5453–5462, 2021.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- The unreasonable effectiveness of deep features as a perceptual metric. IEEE, 2018.
- Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021.
- Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. arXiv, 2020.
- Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18643–18652, 2022.
- Seeing a rose in five thousand ways. arXiv, 2022.
- Learning data-driven reflectance priors for intrinsic image decomposition. In ICCV, 2015.
- Internal statistics of a single natural image. In CVPR. IEEE, 2011.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.