- The paper introduces PolyGen, an autoregressive Transformer model that sequentially predicts mesh vertices and faces for 3D mesh generation.
- It demonstrates significant performance gains, achieving 2.46 bits per vertex (85.1% accuracy) for vertices and 1.82 bits per vertex (90% accuracy) for faces.
- Conditional mesh generation with PolyGen broadens its applications, enabling efficient 3D model synthesis for virtual simulations and robotics.
PolyGen: An Autoregressive Generative Model of 3D Meshes
The paper "PolyGen: An Autoregressive Generative Model of 3D Meshes" introduces a novel approach to directly model polygon meshes using deep learning. Polygon meshes, essential in computer graphics and robotics, present challenges for learning-based models due to their structure involving unordered elements and discrete face configurations. This work addresses these challenges through the proposed model, PolyGen, which leverages a Transformer-based architecture to sequentially predict mesh vertices and faces.
Technical Contributions
PolyGen is structured to explicitly autoregress over 3D mesh data, modeling the meshes as a joint distribution over vertices and faces. The model consists of two primary components:
- Vertex Model: It generates mesh vertices unconditionally. The vertices are treated as sequences where the coordinate tuples (z, y, x) are processed, and a stopping token indicates sequence completion. The model uses a masked Transformer decoder to manage non-local dependencies present in mesh geometry.
- Face Model: This component generates mesh faces conditioned on the previously predicted vertices using pointer networks combined with Transformers to handle the variable-length vertex sequences that define polygon faces.
The paper emphasizes the advantages of representing meshes with polygons of variable sizes, known as n-gons, over traditional triangle meshes. This reduces redundancy by simplifying flat surfaces to single polygons, although with the acknowledgment that non-planar n-gons require careful triangulation for rendering.
Numerical Results
The authors focus on evaluating the model's performance using log-likelihood and predictive accuracy metrics. The experiments show that PolyGen models achieve significant improvement over baseline methods, including uniform and Draco compression standards, in terms of bits-per-vertex and prediction accuracies. For example, the vertex model attains a score of 2.46 bits per vertex with an accuracy of 85.1%, and faces are modeled with 1.82 bits per vertex and 90% accuracy.
Conditional mesh generation was tested using different context inputs such as object classes, images, and voxels. Even without directly optimizing for mesh reconstruction tasks, PolyGen demonstrated competitiveness and superior sample diversity against other methods, like AtlasNet.
Implications and Future Directions
Practically, PolyGen offers enhancements in the automated creation of 3D models used in virtual simulations and robotics. The ability to condition mesh generation on numerous input types broadens its usability in various domains, such as content creation in virtual environments and real-time 3D vision tasks in robotics.
Theoretically, this research contributes to advancements in sequence modeling for discrete geometry, aligning with trends in deep learning that extend natural language processing innovations like Transformers to complex structured data. Future work might aim to refine this model by exploring higher bit-depths for mesh representation and improving computational efficiency, as addressed with alternative vertex models.
PolyGen sets a precedent for new 3D shape synthesis methodologies, potentially inviting further exploration into hybrid models incorporating both graph neural networks and advanced autoregressive techniques to enhance structural coherence and computational efficiency. This foundational research underlines the promise of integrating direct mesh modeling into the neural generative modeling landscape.