ProcGen3D: Procedural 3D Asset Generation
- ProcGen3D is a framework that unifies procedural graph grammars with neural transformer models to achieve scalable and interpretable 3D asset generation.
- It employs edge-based tokenization and autoregressive modeling to convert image cues into compact, editable 3D representations with high fidelity.
- The integration of MCTS-guided sampling ensures semantic and geometric consistency, maintaining both global structure and fine details in 3D reconstructions.
ProcGen3D refers to a significant line of research and practical systems for procedural generation and neural representation in 3D content creation, image-based reconstruction, and interactive editing. In contemporary literature, "ProcGen3D" encompasses both model- and data-driven methods for compact, interpretable, and highly controllable 3D asset generation—from invertible procedural grammars to transformer-based neural autoregressive models, with applications ranging from assets for visual effects and games to novel urban and natural scene synthesis. The central theme is the unification of parametric, grammar-based, or procedural representations with machine learning techniques to enable scalable, diverse, and editable 3D content.
1. Procedural Graph Representations of 3D Objects
Core to ProcGen3D (Zhang et al., 10 Nov 2025) is the abstraction of 3D assets as procedural graphs :
- is a set of nodes corresponding to discrete semantic components, each carrying attribute vectors (e.g., 3D coordinates, radii, or tags).
- is a set of edges encoding relations such as connectivity or attachment, with attributes (e.g., limb type, edge length).
- The graph is generated via a context-free graph grammar , where production rules sequentially expand nonterminals (uninstantiated nodes) into structural and attribute-labeled subgraphs.
This representation enables a decoupling of the procedural space (underlying parametric, rule-based model) from explicit geometric details, allowing image-to-3D systems to output generator-friendly, compact, and interpretable descriptions that are directly usable by procedural mesh generators (e.g., Infinigen, CEM).
2. Edge-Based Tokenization for Neural Sequentialization
To bridge procedural abstraction with neural sequence modeling, ProcGen3D utilizes edge-based tokenization. Each edge is mapped to a token tuple . Continuous attributes are discretized (e.g., into bins), and categorical class labels are enumerated, yielding a total vocabulary of hundreds to thousands of tokens.
A traversal order (DFS for plants, BFS for bridges) produces a linear edge sequence, separated by a special token . The sequence provides an autoregressive context for machine learning models, as required by transformer architectures.
3. Autoregressive Transformer Priors for Procedural Graph Generation
ProcGen3D employs a GPT-style transformer to model , where input tokens represent the procedural graph prefix and is an image embedding (e.g., from a ResNet encoder). The transformer architecture typically mirrors configurations such as OPT-350M: 24 layers, embedding/hid. size, 16 attention heads, and 4096-dimensional feed-forward blocks.
Each token receives a learned embedding and positional encoding; image context is fused via concatenation or cross-attention. Training minimizes standard autoregressive cross-entropy loss:
This formulation allows learned procedural priors to be conditioned on diverse, real-world imagery, while maintaining procedural structure and editability in the output.
4. MCTS-Guided Sampling for Image-Consistent 3D Reconstruction
To enforce semantic and geometric faithfulness between the generated procedural graph and the input image, ProcGen3D introduces an MCTS-guided decoding procedure. At test time:
- The search state is a partial token sequence.
- Successor states are expansions with candidate edges proposed by the transformer's logits.
- The selection strategy employs an Upper Confidence Bound:
- Simulations rollout the transformer predictor for more steps, assemble the procedural asset, and compute a silhouette-based reward quantifying agreement between the rendered prediction and the input mask :
Traversing the search budget, the most promising next edge is selected at each iteration, yielding discrete, interpretable, and image-matched procedural graphs.
5. Comparative Evaluation and Ablations
ProcGen3D is empirically validated on synthetic datasets of cacti, trees, and bridges (each with 10,000 instances; node counts from 20 to 600), employing metrics such as chamfer distance (CD), LPIPS, and CLIP similarity.
| Category | CD ↓ | LPIPS ↓ | CLIP-Sim ↑ |
|---|---|---|---|
| Cactus | 0.0297 | 0.097 | 0.9268 |
| Tree | 0.0265 | 0.081 | 0.9769 |
| Leafy Tree | 0.0648 | 0.168 | 0.9493 |
| Pine Tree | 0.0302 | 0.079 | 0.9680 |
| Bridge | 0.0141 | 0.052 | 0.9820 |
Ablation studies demonstrate:
- Superior performance from the incorporation of RGB cues and DFS tokenization (e.g., Tree: mask input gives CD=0.0485, rgb input gives CD=0.0265).
- Consistent improvement from MCTS guidance versus naive autoregressive decoding.
- The ability to preserve both global topological and fine structural fidelity, outstripping alternatives such as Marching-Cubes post-processed neural fields.
6. Limitations, Applicability, and Extensions
Limitations:
- Applicability is restricted by the availability of procedural generators for the target asset category.
- MCTS-based search introduces significant test-time compute overhead, especially for highly complex graphs (e.g., tens of minutes for large trees).
ProcGen3D directly enables:
- Generation of parametric 3D asset libraries amenable to high-level editing and style transfer.
- Automated stylized asset creation from photographic inputs, supporting rapid iteration.
- Generalization to domains with rule-governed structure (e.g., urban architecture, plant modeling).
Planned or suggested future work includes development of differentiable procedural surrogates for end-to-end learning and expanding grammar expressiveness to broader asset classes.
7. Context within the Procedural Generation Ecosystem
ProcGen3D advances the paradigm of procedural modeling by bridging explicit, interpretable graph grammars with neural autoregressive modeling and search. This complements prior work in inverse procedural modeling (e.g., genetic/memetic optimization of generator parameters (Garifullin et al., 2023)), grammar-based L-systems, and remains distinct from mesh-centric generative methods (e.g., voxel fields, mesh autoencoders).
The methodology is robust to domain shifts, as model training is performed on synthetic data yet generalizes to real imagery. The sequenced, interpretable procedural graphs are immediately compatible with downstream rule-based generators, enabling both high-level controllability and production-ready mesh output.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free