PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models (2312.11417v1)

Published 18 Dec 2023 in cs.CV

Abstract: We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geometric properties of vertices and the topological characteristics of faces. Specifically, we treat meshes as quantized triangle soups, progressively corrupted with categorical noise in the forward diffusion phase. In the reverse diffusion phase, a transformer-based denoising network is trained to revert the noising process, restoring the original mesh structure. At inference, new meshes can be generated by applying this denoising network iteratively, starting with a completely noisy triangle soup. Consequently, our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D workflows. Our extensive experimental analysis shows that PolyDiff achieves a significant advantage (avg. FID and JSD improvement of 18.2 and 5.8 respectively) over current state-of-the-art methods.

References (35)

Citations (23)

View on Semantic Scholar

Summary

The paper introduces PolyDiff, a diffusion model that directly denoises quantized triangle soups to generate high-quality 3D polygonal meshes.
It employs a transformer-based network to preserve discrete mesh data, ensuring accurate geometric and topological fidelity during generation.
Evaluations on ShapeNet using metrics like MMD, COV, 1-NNA, and JSD demonstrate significant improvements over state-of-the-art methods.

Introduction

The development of methods for 3D polygonal mesh generation is a significant progress in the field of computer graphics and 3D modeling. Traditional methods of crafting high-fidelity 3D meshes are labor-intensive and require significant effort from skilled artists. There is a growing interest in generative models that can automate this process, enabling a more efficient workflow for creating 3D content for applications such as video games, movies, and virtual reality. This is where the proposed model, named PolyDiff, steps in, operating directly on mesh data and marking a departure from previous methods relying on alternate 3D representations.

PolyDiff Generative Model

PolyDiff stands out as it is uniquely tailored for generating 3D polygonal meshes through a diffusion-based approach. Instead of converting 3D representations into formats such as voxels, point clouds, or distance fields, PolyDiff works directly with the polygonal mesh structure. In doing so, it maintains the geometric and topological fidelity of 3D shapes that might be otherwise diminished through conversion. This is achieved by treating the mesh data as "quantized triangle soups," which are gradually denoised through a transformer-based network during the reverse diffusion phase. As a result, PolyDiff can iteratively refine a fully noised triangle soup into a high-quality, realistic 3D mesh during inference.

Technical Approach

PolyDiff introduces an innovative strategy where meshes are represented as quantized triangle soups that include both vertex and face data. Throughout the noising and denoising process, the discrete nature of mesh coordinates is preserved, which is crucial as 3D meshes inherently consist of discrete data. By training a network to map from noised to clean meshes, PolyDiff effectively learns the complex distribution of realistic 3D shapes. Evaluation metrics like Minimum Matching Distance (MMD), Coverage (COV), 1-Nearest-Neighbor Accuracy (1-NNA), and Jensen-Shannon Divergence (JSD) are used to assess the quality of generated shapes, showing PolyDiff's superiority in generating cleaner and more diverse meshes than those produced by previous methods.

Experimental Results

The model is evaluated on the ShapeNet dataset across various object categories, revealing distinct improvements over prior methods. PolyDiff is shown to outperform current state-of-the-art methods by substantial margins in terms of perceived quality and shape fidelity. Significantly, the model doesn't simply memorize training samples but is able to generate novel mesh structures. The qualitative results, illustrated with comparative visual samples, demonstrate the cleaner and more realistic mesh generation capabilities of PolyDiff compared to other methods, which suffer from issues like over-smoothing and incomplete structures.

Conclusions and Future Directions

PolyDiff presents a groundbreaking approach in the field of 3D content generation, offering a robust solution for producing polygonal meshes ready for integration into downstream workflows. It holds promise for reducing the workload of 3D artists dramatically, enhancing creativity in the design process, and has potential in various applications in entertainment and virtual environments. While it opens exciting opportunities, like most models, PolyDiff has its limitations, such as the complexity of scene-level generation and the inherently slower sampling process in diffusion models. Future work could focus on optimizing such aspects and expanding the model's applicability to more complex and diverse generative tasks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/22146921/status/1738692185351147763

YouTube

Show All Videos