- The paper presents a dual-stage generative framework that separates part structure planning from spatially-conditioned part synthesis for controllable 3D generation.
- It employs an autoregressive transformer with mask guidance to predict explicit 3D part bounding boxes, ensuring semantic decoupling and structural cohesion.
- Evaluated on a dataset of 180K annotated objects, OmniPart achieves superior geometric and semantic fidelity, enabling applications like compositional editing and material customization.
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
The paper introduces OmniPart, a framework designed for part-aware 3D object generation. It emphasizes semantic decoupling and structural cohesion to facilitate the creation of 3D models in interactive applications. By leveraging a dual-stage approach, OmniPart proposes a novel methodology for generating complex 3D assets with explicit, editable part structures.
Part Structure Planning and Generation
OmniPart's distinctive contribution lies in its two-stage generative framework. The initial stage focuses on structure planning using an autoregressive module to generate 3D part bounding boxes. This process is guided by 2D masks that enable intuitive control over part decomposition. These masks are manually delineated by users or extracted from pre-trained segmentation models such as SAM. As a result, the bounding boxes serve as spatial guides for assembling 3D parts.
Figure 1: An overview of the OmniPart model design. OmniPart generates part-aware, controllable, and high-quality 3D content through two key stages: part structure planning and structured part latent generation.
The second stage employs spatially-conditioned generation to synthesize high-quality 3D parts simultaneously. This stage restates a pre-trained holistic generator, TRELLIS, to produce parts with enhanced semantic awareness and structural coherence.
Implementation and Technical Details
OmniPart's implementation hinges on two core modules: the Controllable Structure Planning and the Spatially-Conditioned Part Synthesis. The structure planning module uses an autoregressive transformer to predict part layouts as bounding boxes. These predictions incorporate flexible mask-based conditions to accommodate varying part granularity or decomposition schemes.
The part synthesis module adapts TRELLIS by conditioning voxel-based regions within identified bounding boxes, and integrating part-aware embeddings to mediate local-global consistency. To achieve detailed outputs with limited annotations, a voxel discarding mechanism is introduced, which identifies and filters extraneous voxels early in the denoising process.
Figure 2: Spatially-conditioned part synthesis. Consistent generation of structured part latents ensures cohesion and quality in part-level outputs.
Datasets and Evaluation
OmniPart’s evaluation leverages a dataset comprising 180K 3D objects with detailed part-level annotations. The performance is benchmarked against existing segmentation-based and direct part generative methods, including Part123 and PartGen. Comparisons underline OmniPart’s ability to deliver superior part independence without compromising global cohesiveness.
The quantitative metrics deployed include Chamfer Distance and F1-score across multiple thresholds to ascertain both geometric and semantic fidelity at part and object levels.
Figure 3: Visualization of the training dataset. The dataset facilitates comprehensive evaluation through diverse part-count demonstrations.
Applications and Implications
OmniPart's flexible design fosters several downstream applications such as compositional editing, mask-controlled generation, and material customization. Tailored granularity control via 2D masks allows users to define specific structural patterns and apply independent texture modifications to parts.
Figure 4: Applications of our part-aware 3D generation framework. Part-aware outputs bolster a range of practical applications, demonstrating enhanced generation versatility.
The integration with structured latent representations boosts efficiency by concurrently synthesizing all parts and supporting high-quality geometrical processing. By achieving low semantic coupling across components, OmniPart pioneers a modular generation approach that holds significant implications for 3D-centric disciplines.
Conclusion
OmniPart emerges as a robust framework for part-aware 3D generation by tactically separating structural planning from detailed synthesis. Its innovative use of autoregressive models and adaptability to existing holistic generators mark a significant stride towards more interpretable and interactive 3D assets. Despite its reliance on axis-aligned bounding boxes, OmniPart sets a precedent for future endeavors in refining precision without detracting from the overarching aim of structural coordination.
Overall, OmniPart paves the way for more comprehensive and scalable 3D modeling, reinforcing its use in contemporary visual computing and design.