MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers (2406.10163v2)

Published 14 Jun 2024 in cs.CV and cs.AI

Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

Citations (15)

View on Semantic Scholar

Summary

The paper introduces a novel approach that treats mesh extraction as a generation problem using shape-conditioned autoregressive transformers to produce artist-quality 3D meshes.
It employs a hybrid framework with a VQ-VAE for mesh vocabulary learning and a noise-resistant transformer decoder to robustly generate meshes from point cloud shape conditions.
Experimental results show that the method dramatically reduces mesh complexity while maintaining competitive precision, streamlining 3D asset production for industries like gaming and film.

Overview of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"

"MeshAnything" addresses a critical bottleneck in the 3D industry by presenting a method to generate Artist-Created Meshes (AMs) from various 3D representations using shape-conditioned autoregressive transformers. This paper introduces a novel perspective by treating mesh extraction as a generation problem rather than a reconstruction one, facilitating the replacement of manually crafted 3D assets with automatically generated ones.

Key Contributions

Shape-Conditioned AM Generation: The paper proposes a pioneering strategy of Shape-Conditioned AM Generation, emphasizing the creation of meshes that mimic those produced by human artists. Previous methods focused on reconstruction-based mesh extraction, leading to inefficiencies due to dense meshes with poor topology.
MeshAnything Framework: MeshAnything combines a Vector Quantized Variational Autoencoder (VQ-VAE) with a shape-conditioned decoder-only transformer. This hybrid architecture first learns a mesh vocabulary using the VQ-VAE and subsequently trains the transformer for shape-conditioned autoregressive mesh generation.
Noise-Resistant Decoder: To enhance mesh generation quality, the paper introduces a noise-resistant decoder that incorporates shape conditions, aiming to robustly decode even poorly predicted token sequences by the transformer.

Methodological Innovations

Data Preparation and Shape Encoding

The authors leverage point clouds as the shape condition representation due to their continuous and explicit nature, facilitating easy conversion from various 3D representations.
Meshes are carefully paired with shape conditions created by sampling point clouds from ground truth meshes with intentional quality reduction to mimic real-world application scenarios.

VQ-VAE for Mesh Vocabulary Learning

The VQ-VAE is trained with transformers used for both encoder and decoder, diverging from traditional graph convolutional networks.
A novel post-training fine-tuning stage incorporates shape conditions into the decoder, enhancing its resilience to noise.

Shape-Conditioned Autoregressive Transformer

The transformer is augmented with shape condition tokens derived from an encoder-pretrained on point clouds. This integration enables the autoregressive model to generate meshes that adhere closely to the provided shapes.

Experimental Validation

Qualitative Performance

MeshAnything demonstrates the ability to generate AMs that significantly reduce the number of faces and vertices while maintaining high-quality shape alignment, topology, and geometric feature representation.

Quantitative Results

Extensive experiments show that MeshAnything generates meshes with hundreds of times fewer faces compared to traditional methods like Marching Cubes and Remesh, while achieving competitive precision in metrics such as Chamfer Distance (CD) and Edge Chamfer Distance (ECD).
The noise-resistant decoder notably improves the model's robustness to lower-quality token sequences, enhancing overall generated mesh quality.

Implications and Future Directions

Practical Applications

The practical implications of this research are profound, as MeshAnything enables the efficient generation of high-quality 3D assets for the gaming, film, and burgeoning metaverse industries. By aligning generated meshes to the quality of artist-created assets, this method promises to significantly reduce the labor costs and time associated with 3D model production.

Theoretical Impact and Future Research

The approach of treating mesh extraction as a generation problem opens new avenues for research in 3D asset production. Future work may explore expanding the scalability of MeshAnything to handle large-scale scenes and more complex objects. Additionally, further improvements in model stability and robustness will be essential to transition from theoretical advancements to widespread application.

In conclusion, the MeshAnything framework presents a significant advancement in the field of 3D asset production, offering practical solutions for integrating automatically generated meshes into industrial pipelines. By addressing the inefficiencies inherent in previous methods and proposing innovative architectural solutions, this research lays groundwork for future developments in automated 3D modeling.

PDF Markdown

Related Papers

Tweets

https://twitter.com/dylan_ebert_/status/1802782673854767220

https://twitter.com/janusch_patas/status/1802627284861395050

https://twitter.com/zhenjun_zhao/status/1802556773758287894

https://twitter.com/AdeenaY8/status/1803528298187690362

https://twitter.com/taziku_co/status/1803397196697706727

https://twitter.com/fly51fly/status/1802813939987996986

YouTube

Show All Videos