Papers
Topics
Authors
Recent
2000 character limit reached

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects (2401.05236v1)

Published 10 Jan 2024 in cs.CV

Abstract: Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Shape, illumination, and reflectance from shading. TPAMI, 2014.
  2. Recovering intrinsic scene characteristics. Comput. vis. syst, 1978.
  3. Brent Burley and Walt Disney Animation Studios. Physically-based shading at disney. In Siggraph, 2012.
  4. Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021.
  5. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  6. Single-image depth perception in the wild. NeurIPS, 2016.
  7. Dib-r++: learning to predict lighting and material with a hybrid differentiable renderer. NeurIPS, 2021.
  8. Mask2former for video instance segmentation. arXiv, 2021.
  9. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 465–474. 2023.
  10. Appearance-from-motion: Recovering spatially varying surface reflectance under unknown lighting. ACM Transactions on Graphics (TOG), 33(6):1–12, 2014.
  11. Recovering 6d object pose and predicting next-best-view in the crowd. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3583–3592, 2016.
  12. Texture synthesis by non-parametric sampling. In ICCV, 1999.
  13. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
  14. Depth map prediction from a single image using a multi-scale deep network. NeurIPS, 2014.
  15. Image denoising via sparse and redundant representations over learned dictionaries. TIP, 2006.
  16. Reconstructing 3d human pose by watching humans in the mirror. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12814–12823, 2021.
  17. William T Freeman. The generic viewpoint assumption in a framework for visual perception. Nature, 1994.
  18. Super-resolution from a single image. In ICCV, 2009.
  19. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV, 2009.
  20. Multiple view geometry in computer vision. Cambridge university press, 2003.
  21. Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. arXiv:2206.03380, 2022.
  22. Photometric ambient occlusion. In CVPR, 2013.
  23. Berthold KP Horn. Determining lightness from an image. Computer graphics and image processing, 1974.
  24. Berthold KP Horn. Obtaining shape from shading information. The psychology of computer vision, 1975.
  25. Multiple-view 3-d reconstruction using a mirror. 2005.
  26. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
  27. Dex-nerf: Using a neural radiance field to grasp transparent objects. arXiv, 2021.
  28. James T Kajiya. The rendering equation. In Proceedings of the 13th annual conference on Computer graphics and interactive techniques, 1986.
  29. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice, 2013.
  30. Rich intrinsic image decomposition of outdoor scenes from multiple views. IEEE transactions on visualization and computer graphics, 2012.
  31. Lightness and retinex theory. Josa, 1971.
  32. Image-based reconstruction of spatial appearance and geometric detail. TOG, 2003.
  33. Multi-plane program induction with 3d box priors. NeurIPS, 2020.
  34. Perspective plane program induction from a single image. In CVPR, 2020.
  35. Neuralangelo: High-fidelity neural surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  36. Shape and material capture at home. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6123–6133, 2021.
  37. Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
  38. Mira: Mental imagery for robotic affordances. In CoRL, 2023.
  39. Editing conditional radiance fields. In ICCV, 2021.
  40. H Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 1981.
  41. Single image intrinsic decomposition without a single intrinsic image. In ECCV, 2018.
  42. Exploiting sparse semantic hd maps for self-driving vehicle localization. In IROS, 2019.
  43. Stephen Robert Marschner. Inverse rendering for computer graphics. Cornell University, 1998.
  44. Blind deblurring using internal patch recurrence. In ECCV, 2014.
  45. Nerf: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020.
  46. Scene-level pose estimation for multiple instances of densely packed objects. In Conference on Robot Learning, pages 1133–1145. PMLR, 2020.
  47. Extracting triangular 3d models, materials, and lighting from images. In CVPR, 2022.
  48. Camp: Camera preconditioning for neural radiance fields. ACM Trans. Graph., 2023.
  49. Vision transformers for dense prediction. In ICCV, 2021.
  50. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.
  51. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  52. Illumination from shadows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003.
  53. Object shape and reflectance modeling from observation. In SIGGRAPH, 1997.
  54. Structure-from-motion revisited. In CVPR, 2016.
  55. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  56. Implicit neural representations with periodic activation functions. In Proc. NeurIPS, 2020.
  57. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7495–7504, 2021.
  58. Sparf: Neural radiance fields from sparse and noisy poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4190–4200, 2023.
  59. Cadsim: Robust and scalable in-the-wild 3d reconstruction for controllable sensor simulation. In 6th Annual Conference on Robot Learning.
  60. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  61. Pet-neus: Positional encoding triplanes for neural surfaces. 2023.
  62. Neural fields meet explicit geometric representation for inverse rendering of urban scenes. arXiv, 2023.
  63. Geometry processing with neural fields. NeurIPS, 2021.
  64. Unisim: A neural closed-loop sensor simulator. CVPR, 2023.
  65. Multiview neural surface reconstruction by disentangling geometry and appearance. NeurIPS, 2020.
  66. Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999.
  67. Peanut: Predicting and navigating to unseen targets. arXiv, 2022.
  68. Iron: Inverse rendering by optimizing neural sdfs and materials from photometric images. In CVPR, 2022.
  69. Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5453–5462, 2021.
  70. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  71. The unreasonable effectiveness of deep features as a perceptual metric. IEEE, 2018.
  72. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021.
  73. Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. arXiv, 2020.
  74. Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18643–18652, 2022.
  75. Seeing a rose in five thousand ways. arXiv, 2022.
  76. Learning data-driven reflectance priors for intrinsic image decomposition. In ICCV, 2015.
  77. Internal statistics of a single natural image. In CVPR. IEEE, 2011.
Citations (1)

Summary

  • The paper introduces a novel inverse graphics framework that exploits duplicate objects in a single image to reconstruct 3D structure.
  • It employs a rotation-robust pose estimation module combined with geometric reconstruction to align multiple instances as virtual multi-view observations.
  • Empirical results on synthetic and real-world data demonstrate superior reconstruction accuracy compared to traditional multi-view methods.

Overview of Structure from Duplicates

The field of computer vision has long grappled with the challenge of inverse rendering, the process of deducing an object's 3D structure, material properties, and lighting from images. This task is particularly difficult when only a single image is available. A new inverse graphics framework named Structure from Duplicates (SfD) addresses this challenge using an innovative approach: exploiting the presence of identical objects within a single image.

The Key Insight and Methodology

SfD operates on the principle that identical objects seen from different angles offer rich cues akin to viewing a single object from multiple perspectives. The process begins with identifying such objects in an image. The framework then estimates each object's pose using a rotation-robust pose estimation method, setting the stage for 3D reconstruction.

Central to SfD's ability to unravel the scene's complexity is the use of two modules: one for robust pose estimation even when the objects are rotated within the plane, and a geometric reconstruction module that interprets the object's details. This dual-module approach aligns multiple instances of an object as if viewed by different virtual cameras, thereby translating a single-view multi-object scenario into a multi-view single-object challenge.

Innovations and Advantages

This research stands out by using object duplicates from a single image as a powerful prior to aid inverse rendering. The process generates highly realistic 3D reconstructions which are superior to traditional models, even when these models are provided with greater data through multi-view observations. Notably, the improved accuracy of SfD is not confined to synthetic images; it also extends to real-world examples.

Empirical Evidence and Applications

The effectiveness of SfD has been validated on a new dataset consisting of synthetic and real-world images featuring duplicated objects. The model exhibits an impressive ability to infer details that rival or exceed those produced by existing frameworks requiring more extensive multi-view inputs. These capabilities open up possibilities for relighting, material editing, and seamlessly integrating new objects into previously captured scenes.

Future Horizons and Challenges

Despite its strengths, SfD is currently designed for scenarios with nearly identical objects and relies on accurate instance segmentation masks. Improved performance could be investigated by incorporating methods that handle minor variations among object instances and refine pose estimations. Future work could also focus on extending the framework to handle the geometry of unseen regions, a common limitation within neural field methodologies.

In conclusion, Structure from Duplicates paves the way for significant advancements in single-image inverse rendering, with the potential to transform applications across computer vision, graphics, and robotics.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 4 likes.

Upgrade to Pro to view all of the tweets about this paper: