• Locally conditioned diffusion is a new approach to compositional scene diffusion, providing control over semantic parts using text prompts and bounding boxes.
  • This method enables higher fidelity 3D scene generation compared to existing approaches, using a score distillation sampling-based pipeline.

Key terms:

  • Compositional scene diffusion: The process of generating complex 3D scenes by combining multiple components or objects.
  • Locally conditioned diffusion: An approach that provides control over semantic parts of a 3D scene using text prompts and bounding boxes, ensuring seamless transitions between parts.
  • Text prompts: Input text that helps guide the generation of a 3D scene.
  • Bounding boxes: Rectangular regions that define the boundaries of objects within a 3D scene.
  • Score distillation sampling: A pipeline used in the text-to-3D synthesis process that optimizes the Voxel NeRF representation of the 3D scene.


Research 3D scenes 3D scene generation text-to-3D Voxel NeRF Score Distillation Sampling text-to-3D synthesis Text Prompts Bounding Boxes Semantic Control