Scaling native 3D generative models to large datasets

Determine effective methodologies to scale native 3D generative models—trained directly on 3D representations such as point clouds, polygonal meshes, or neural fields—to large 3D asset datasets, enabling robust generalization beyond limited shape categories.

Background

In the Related Works section, the paper surveys approaches that directly train 3D generative or diffusion models on native 3D representations. These methods often rely on relatively small public datasets, which restricts their validation to limited categories and hinders broad generalization.

The authors explicitly note that, despite progress, the community has not yet established how to scale such native 3D models to large datasets. Wonder3D++ sidesteps this by leveraging 2D diffusion priors with cross-domain (normal and color) multi-view generation and a cascaded mesh extraction pipeline, but the open question remains pertinent for scaling native 3D generative modeling itself.

References

However, due to the limited size of publicly available 3D assets datasets, most of the works have only been validated on limited categories of shapes or lack of sufficient generalization properties, and how to scale up on large datasets is still an open problem.

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image (2511.01767 - Yang et al., 3 Nov 2025) in Section 2.3, 3D Generative Models (Related Works)