LION: Latent Point Diffusion Models for 3D Shape Generation (2210.06978v1)

Published 12 Oct 2022 in cs.CV, cs.LG, and stat.ML

Abstract: Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the hierarchical Latent Point Diffusion Model (LION) for 3D shape generation. LION is set up as a variational autoencoder (VAE) with a hierarchical latent space that combines a global shape latent representation with a point-structured latent space. For generation, we train two hierarchical DDMs in these latent spaces. The hierarchical VAE approach boosts performance compared to DDMs that operate on point clouds directly, while the point-structured latents are still ideally suited for DDM-based modeling. Experimentally, LION achieves state-of-the-art generation performance on multiple ShapeNet benchmarks. Furthermore, our VAE framework allows us to easily use LION for different relevant tasks: LION excels at multimodal shape denoising and voxel-conditioned synthesis, and it can be adapted for text- and image-driven 3D generation. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern surface reconstruction techniques to generate smooth 3D meshes. We hope that LION provides a powerful tool for artists working with 3D shapes due to its high-quality generation, flexibility, and surface reconstruction. Project page and code: https://nv-tlabs.github.io/LION.

Citations (407)

View on Semantic Scholar

Summary

The paper introduces LION, a model that leverages VAEs and denoising diffusion models to generate expressive, high-quality 3D shapes.
Its hierarchical latent space design effectively disentangles global structure from local details, achieving state-of-the-art results on ShapeNet benchmarks.
LION demonstrates flexibility through voxel-guided synthesis and smooth mesh reconstruction, enabling interactive and application-driven 3D generation.

Overview of LION: Latent Point Diffusion Models for 3D Shape Generation

This paper presents LION, a novel hierarchical architecture combining Variational Autoencoders (VAEs) with Denoising Diffusion Models (DDMs) for 3D shape generation. It addresses three critical needs in the domain of 3D generative models: generating realistic and high-quality shapes, enabling flexible and interactive model use, and outputting smooth meshes suitable for graphics software. The paper demonstrates LION’s capabilities in providing enhanced expressivity and flexibility in generative modeling of 3D point clouds.

Key features of LION include its hierarchical latent space design, consisting of both global shape latent vectors and point-structured latent spaces. The integration of VAEs with DDMs in these latent spaces allows for improved modeling of complex structures without directly operating on complex point clouds. This architecture empowers LION to generate outputs that meet high aesthetic and geometric standards, as evidenced by its achievement of state-of-the-art results on ShapeNet benchmarks.

Architectural Contributions

LION’s architecture distinguishes itself by offering:

Expressivity: The use of hierarchical latent spaces allows LION to capture complex geometric structures more efficiently. Latent diffusion models learn a smoothed distribution over these latent spaces, which simplifies the modeling task and improves expressivity. This setup yields a natural disentanglement between global shapes and local details, an advantageous characteristic for generating high-quality shapes.
Flexibility: LION’s VAE framework equips it for multiple tasks such as multimodal shape synthesis and interpolation. Fine-tuning LION’s encoder networks with voxelized or noisy inputs allows for effective voxel-guided synthesis and denoising, achieving accuracy in reconstructing shapes from these perturbed forms. The model can also easily incorporate CLIP embeddings to enable image- and text-driven 3D generation.
Mesh Reconstruction: By integrating Shape As Points (SAP) for surface reconstruction, LION can output smooth meshes. SAP is fine-tuned on data generated by LION to specialize in the noise characteristic of LION-generated point clouds, improving mesh quality.

Experimental Validation

LION’s efficacy is extensively validated through:

Single-class and Many-class Shape Generation: LION achieves state-of-the-art performance in single-class settings on ShapeNet categories like airplanes, chairs, and cars. It also scales well in more challenging settings, such as generating shapes across 13 or even 55 diverse ShapeNet categories without class conditioning. The model demonstrates a significant improvement in 1-NNA metrics compared to existing methods.
Applications: The model's versatility is displayed through applications like voxel-guided synthesis, where it outperforms baselines in maintaining fidelity to voxel inputs while providing diverse outputs. Moreover, LION exhibits strong performance in autoencoding and shape interpolation tasks, enabling seamless transitions between complex shapes in its latent space.

Implications and Future Directions

LION’s contribution lies in its improved and flexible approach to 3D shape generation, providing a robust toolset for digital artists and applications demanding high-quality and diverse 3D models. This framework holds potential for extensions in several directions:

Texture Synthesis and Full Scene Generation: Future work could explore integrated support for synthesizing textures as well as beyond single-object 3D generation to full scene synthesis.
Real-time Applications: The model can benefit from accelerated sampling techniques to facilitate real-time interactive applications, which are increasingly pertinent for immersive media and gaming industries.

In summary, LION represents a substantial advancement in 3D generative models, delivering precise, flexible, and computationally efficient tools for generating complex 3D shapes. Its integration of VAEs and DDMs embodies a compelling strategy for addressing longstanding challenges in the 3D modeling domain.

Related Papers

GitHub

Redirecting to https://research.nvidia.com/labs/toronto-ai/LION/

Tweets

https://twitter.com/JohnFStifter/status/1767449672161923143