Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models (2312.11417v1)

Published 18 Dec 2023 in cs.CV

Abstract: We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geometric properties of vertices and the topological characteristics of faces. Specifically, we treat meshes as quantized triangle soups, progressively corrupted with categorical noise in the forward diffusion phase. In the reverse diffusion phase, a transformer-based denoising network is trained to revert the noising process, restoring the original mesh structure. At inference, new meshes can be generated by applying this denoising network iteratively, starting with a completely noisy triangle soup. Consequently, our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D workflows. Our extensive experimental analysis shows that PolyDiff achieves a significant advantage (avg. FID and JSD improvement of 18.2 and 5.8 respectively) over current state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Learning representations and generative models for 3d point clouds. In ICML, 2018.
  2. Structured denoising diffusion models in discrete state-spaces. In NeurIPS, 2021.
  3. All are worth words: A vit backbone for diffusion models. In CVPR, 2023.
  4. ShapeNet: An Information-Rich 3D Model Repository. preprint arXiv:1512.03012, 2015.
  5. PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models. arXiv preprint arXiv:2306.01461, 2023.
  6. Learning implicit fields for generative shape modeling. In CVPR, 2019.
  7. Bsp-net: Generating compact meshes via binary space partitioning. In CVPR, 2020.
  8. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  9. Diffusion models beat GANs on image synthesis. In NeurIPS, 2021.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. A point set generation network for 3d object reconstruction from a single image. In CVPR, 2017.
  12. Learning deformable tetrahedral meshes for 3d reconstruction. In NeurIPS, 2020.
  13. Generative adversarial nets. In NeurIPS, 2014.
  14. AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In CVPR, 2018.
  15. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
  16. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  17. Equivariant diffusion for molecule generation in 3d. In ICLR, 2022.
  18. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
  19. Auto-Encoding Variational Bayes. In ICLR, 2014.
  20. MeshDiffusion: Score-based generative 3d mesh modeling. In ICLR, 2023.
  21. Marching cubes: A high resolution 3d surface construction algorithm. In SIGGRAPH, 1987.
  22. Decoupled weight decay regularization. In ICLR, 2019.
  23. Diffusion probabilistic models for 3d point cloud generation. In CVPR, 2021.
  24. PolyGen: An autoregressive generative model of 3D meshes. In ICML, 2020.
  25. Improved denoising diffusion probabilistic models. ArXiv preprint, 2021.
  26. State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204, 2023.
  27. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  28. Housediffusion: Vector floorplan generation via a diffusion model with discrete and continuous denoising. In CVPR, 2023.
  29. Diffusion-based signed distance fields for 3d shape generation. In CVPR, 2023.
  30. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  31. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  32. Digress: Discrete denoising diffusion for graph generation. In ICLR, 2023.
  33. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NeurIPS, 2016.
  34. LION: Latent point diffusion models for 3d shape generation. In NeurIPS, 2022.
  35. 3d shape generation and completion through point-voxel diffusion. In CVPR, 2021.
Citations (23)

Summary

  • The paper introduces PolyDiff, a diffusion model that directly denoises quantized triangle soups to generate high-quality 3D polygonal meshes.
  • It employs a transformer-based network to preserve discrete mesh data, ensuring accurate geometric and topological fidelity during generation.
  • Evaluations on ShapeNet using metrics like MMD, COV, 1-NNA, and JSD demonstrate significant improvements over state-of-the-art methods.

Introduction

The development of methods for 3D polygonal mesh generation is a significant progress in the field of computer graphics and 3D modeling. Traditional methods of crafting high-fidelity 3D meshes are labor-intensive and require significant effort from skilled artists. There is a growing interest in generative models that can automate this process, enabling a more efficient workflow for creating 3D content for applications such as video games, movies, and virtual reality. This is where the proposed model, named PolyDiff, steps in, operating directly on mesh data and marking a departure from previous methods relying on alternate 3D representations.

PolyDiff Generative Model

PolyDiff stands out as it is uniquely tailored for generating 3D polygonal meshes through a diffusion-based approach. Instead of converting 3D representations into formats such as voxels, point clouds, or distance fields, PolyDiff works directly with the polygonal mesh structure. In doing so, it maintains the geometric and topological fidelity of 3D shapes that might be otherwise diminished through conversion. This is achieved by treating the mesh data as "quantized triangle soups," which are gradually denoised through a transformer-based network during the reverse diffusion phase. As a result, PolyDiff can iteratively refine a fully noised triangle soup into a high-quality, realistic 3D mesh during inference.

Technical Approach

PolyDiff introduces an innovative strategy where meshes are represented as quantized triangle soups that include both vertex and face data. Throughout the noising and denoising process, the discrete nature of mesh coordinates is preserved, which is crucial as 3D meshes inherently consist of discrete data. By training a network to map from noised to clean meshes, PolyDiff effectively learns the complex distribution of realistic 3D shapes. Evaluation metrics like Minimum Matching Distance (MMD), Coverage (COV), 1-Nearest-Neighbor Accuracy (1-NNA), and Jensen-Shannon Divergence (JSD) are used to assess the quality of generated shapes, showing PolyDiff's superiority in generating cleaner and more diverse meshes than those produced by previous methods.

Experimental Results

The model is evaluated on the ShapeNet dataset across various object categories, revealing distinct improvements over prior methods. PolyDiff is shown to outperform current state-of-the-art methods by substantial margins in terms of perceived quality and shape fidelity. Significantly, the model doesn't simply memorize training samples but is able to generate novel mesh structures. The qualitative results, illustrated with comparative visual samples, demonstrate the cleaner and more realistic mesh generation capabilities of PolyDiff compared to other methods, which suffer from issues like over-smoothing and incomplete structures.

Conclusions and Future Directions

PolyDiff presents a groundbreaking approach in the field of 3D content generation, offering a robust solution for producing polygonal meshes ready for integration into downstream workflows. It holds promise for reducing the workload of 3D artists dramatically, enhancing creativity in the design process, and has potential in various applications in entertainment and virtual environments. While it opens exciting opportunities, like most models, PolyDiff has its limitations, such as the complexity of scene-level generation and the inherently slower sampling process in diffusion models. Future work could focus on optimizing such aspects and expanding the model's applicability to more complex and diverse generative tasks.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com