Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion (2305.16283v5)

Published 25 May 2023 in cs.CV

Abstract: Controllable scene synthesis aims to create interactive environments for various industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to lacking a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset will be released upon acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. 3d scene graph: A structure for unified semantics, 3d space, and camera. In ICCV, 2019.
  2. Cc3d: Layout-conditioned generation of compositional 3d scenes. In ICCV, 2023.
  3. Demystifying MMD GANs. In ICLR, 2018.
  4. An algebraic model for parameterized shape editing. ACM Transactions on Graphics (TOG), 31(4):1–10, 2012.
  5. Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
  6. Shapenet: An information-rich 3d model repository, 2015.
  7. A study of dynamic scene automatic generation system for micro-film studio. In Applied Mechanics and Materials, volume 411, pages 993–996, 2013.
  8. SDFusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, 2023.
  9. 3d u-net: Learning dense volumetric segmentation from sparse annotation. In MICCAI, 2016.
  10. A volumetric method for building complex models from range images. In SIGGRAPH, 1996.
  11. Meta-sim2: Unsupervised learning of scene structure for synthetic data generation. In ECCV, 2020.
  12. Semantic image manipulation using scene graphs. In CVPR, 2020.
  13. Graph-to-3d: End-to-end generation and manipulation of 3d scenes using scene graphs. In ICCV, 2021.
  14. A unified framework for piecewise semantic reconstruction in dynamic scenes via exploiting superpixel relations. In ICRA, 2020.
  15. Ccd-3dr: Consistent conditioning in diffusion for single-image 3d reconstruction, 2023.
  16. U-red: Unsupervised 3d shape retrieval and deformation for partial point clouds. In ICCV, 2023.
  17. Training-free structured diffusion guidance for compositional text-to-image synthesis. In ICLR, 2023.
  18. 3d-front: 3d furnished rooms with layouts and semantics. In ICCV, 2021.
  19. 3d-future: 3d furniture shape with texture. International Journal of Computer Vision, 129:3313–3337, 2021.
  20. Make-a-scene: Scene-based text-to-image generation with human priors. In ECCV, 2022.
  21. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
  22. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  23. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. In RSS, 2022.
  24. Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars. International Journal of Computer Vision, 126:920–941, 2018.
  25. Image generation from scene graphs. In CVPR, 2018.
  26. Image retrieval using scene graphs. In CVPR, 2015.
  27. Layoutvae: Stochastic scene layout generation from a label set. In ICCV, 2019.
  28. Towards long-term retrieval-based visual localization in indoor environments with changes. IEEE Robotics and Automation Letters, 8(4):1975–1982, 2023.
  29. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
  30. 3-d scene graph: A sparse and semantic representation of physical environments for intelligent agents. IEEE Transactions on Cybernetics, 50(12):4921–4933, 2020.
  31. Semantic graph based place recognition for 3d point clouds. In IROS, 2020.
  32. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123(1):32–73, 2017.
  33. Diffusion-sdf: Text-to-shape via voxelized diffusion. In CVPR, 2023.
  34. Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics (TOG), 38(2):1–16, 2019.
  35. Towards unsupervised learning of generative models for 3d controllable image synthesis. In CVPR, 2020.
  36. End-to-end optimization of scene layout. In CVPR, 2020.
  37. Language-driven synthesis of 3d scenes from scene databases. ACM Transactions on Graphics (TOG), 37(6):1–16, 2018.
  38. SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
  39. Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In CVPR, 2020.
  40. Generative layout modeling using constraint graphs. In ICCV, 2021.
  41. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
  42. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
  43. Atiss: Autoregressive transformers for indoor scene synthesis. In NeurIPS, 2021.
  44. Sg-vae: Scene grammar variational autoencoder to generate new indoor scenes. In ECCV, 2020.
  45. Learning transferable visual models from natural language supervision. In ICML, 2021.
  46. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  47. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  48. 3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans. In RSS, 2020.
  49. Kimera: From slam to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research, 40(12-14):1510–1546, 2021.
  50. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  51. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
  52. Diffuscene: Scene graph denoising diffusion probabilistic model for generative indoor scene synthesis, 2023.
  53. Graph-structured representations for visual question answering. In CVPR, 2017.
  54. Factoring shape, pose, and layout from the 2d image of a 3d scene. In CVPR, 2018.
  55. Joint learning of 3d shape retrieval and deformation. In CVPR, 2021.
  56. Neural discrete representation learning. In NeurIPS, 2017.
  57. Rio: 3d object instance re-localization in changing indoor environments. In ICCV, 2019.
  58. Learning 3d semantic scene graphs from 3d indoor reconstructions. In CVPR, 2020.
  59. Planit: Planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics (TOG), 38(4):1–15, 2019.
  60. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics (TOG), 37(4):70, 2018.
  61. Generative image modeling using style and structure adversarial networks. In ECCV, 2016.
  62. Sceneformer: Indoor scene generation with transformers. In 3DV, 2021.
  63. Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences. In CVPR, 2021.
  64. Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light: Science & Applications, 10(1):216, 2021.
  65. Pointflow: 3d point cloud generation with continuous normalizing flows. In ICCV, 2019.
  66. Diffusion-based scene graph to image generation with masked contrastive pre-training, 2022.
  67. Sg-bot: Object rearrangement via coarse-to-fine robotic imagination on scene graphs, 2022.
  68. Sst: Real-time end-to-end monocular 3d reconstruction via sparse spatial-temporal guidance. In ICME, 2023.
  69. Image generation from layout. In CVPR, 2019.
  70. Luminous: Indoor scene generation for embodied ai challenges. In NeurIPS Workshop, 2021.
  71. Scenegraphnet: Neural message passing for 3d indoor scene augmentation. In ICCV, 2019.
  72. 4d-or: Semantic scene graphs for or domain modeling. In MICCAI, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Guangyao Zhai (26 papers)
  2. Evin Pınar Örnek (14 papers)
  3. Shun-Cheng Wu (11 papers)
  4. Yan Di (28 papers)
  5. Federico Tombari (214 papers)
  6. Nassir Navab (459 papers)
  7. Benjamin Busam (82 papers)
Citations (4)
Youtube Logo Streamline Icon: https://streamlinehq.com