InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models (2404.07191v2)

Published 10 Apr 2024 in cs.CV

Abstract: We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability. By synergizing the strengths of an off-the-shelf multiview diffusion model and a sparse-view reconstruction model based on the LRM architecture, InstantMesh is able to create diverse 3D assets within 10 seconds. To enhance the training efficiency and exploit more geometric supervisions, e.g, depths and normals, we integrate a differentiable iso-surface extraction module into our framework and directly optimize on the mesh representation. Experimental results on public datasets demonstrate that InstantMesh significantly outperforms other latest image-to-3D baselines, both qualitatively and quantitatively. We release all the code, weights, and demo of InstantMesh, with the intention that it can make substantial contributions to the community of 3D generative AI and empower both researchers and content creators.

References (64)

Citations (93)

View on Semantic Scholar

Summary

The paper introduces a feed-forward framework that leverages multiview diffusion and sparse-view reconstruction models to generate high-quality 3D meshes from a single image.
The approach integrates a differentiable iso-surface extraction module to optimize mesh quality and training efficiency, achieving generation in just 10 seconds.
The results demonstrate significant improvements in 3D consistency and scalability, advancing practical applications in virtual reality, gaming, and industrial design.

InstantMesh: An Open-Source Framework for Instant 3D Mesh Generation from Single Images

Introduction to 3D Mesh Generation

The rapid advancement of 3D generative artificial intelligence has opened avenues to transform single-view images into elaborate 3D models. These transformations find extensive applications across virtual reality, industrial design, gaming, and animation. However, the task of generating high-quality 3D assets from single images is far from trivial, primarily due to the inherent complexity of interpreting 3D information from 2D data and the limitations posed by the quality and scale of existing 3D datasets.

Previous Works and Their Limitations

Prior techniques have explored various approaches, including distilling 2D diffusion models into 3D representations and employing large reconstruction models (LRMs) to directly map image tokens to 3D outputs. While these methods have shown promising directions, their practical utility is hampered by issues such as lengthy generation times, multi-view inconsistencies (the "Janus" problem), and constraints in training efficiency and scalability.

InstantMesh: High-Quality and Efficient 3D Mesh Generation

InstantMesh addresses these challenges by introducing a novel, feed-forward framework aimed at generating high-quality 3D meshes from a single image. The framework combines the strengths of multiview diffusion models for generating 3D-consistent multi-view images and a sparse-view LRM for direct mesh prediction, all within a rapid 10-second turnaround. This approach leverages a differentiable iso-surface extraction module integrated within the framework, optimizing directly on the mesh representation and significantly improving training efficiency and output quality.

Technical Framework

InstantMesh's architecture consists of two primary components:

Multi-View Diffusion Model: This model synthesizes consistent multi-view images from an input image, enhancing 3D consistency.
Sparse-View Large Reconstruction Model: Tailored from Instant3D's approach, this model predicts 3D meshes from the generated multi-view images. It significantly benefits from the integration of FlexiCubes for iso-surface extraction, directly applying geometric supervisions to enhance the mesh's quality.

Key Improvements and Results

Enhanced 3D Consistency and Quality: By synergizing multiview generation with direct mesh prediction, InstantMesh significantly outperforms current image-to-3D models, achieving state-of-the-art results in both qualitative and quantitative evaluations.
Efficient and Scalable Training: The differentiable iso-surface extraction module enables the efficient use of high-resolution images and geometric data for supervising the model, resulting in smoother meshes and improved training scalability.

Practical Implications and Future Directions

InstantMesh presents a substantial leap towards realizing the potential of 3D generative AI in practical applications. It showcases the possibility of generating detailed, high-quality 3D assets from single images rapidly, opening new frontiers for content creators, researchers, and industries reliant on 3D modeling. Looking ahead, future iterations of this framework could explore enhancing resolution capabilities, improving multi-view consistency with advanced diffusion models, and increasing the model's efficacy in capturing fine details and complex structures.

Conclusion

The introduction of InstantMesh marks a significant advancement in image-to-3D asset generation, addressing major bottlenecks such as generation speed, multi-view consistency, and training efficiency. Its open-source release underscores the commitment to furthering research and application development in the 3D generative AI domain. As the field continues to evolve, InstantMesh offers a foundational model for new explorations and innovations, setting a new benchmark for rapid, high-quality 3D mesh generation from single images.

Related Papers

Tweets

https://twitter.com/taziku_co/status/1780043186708394407

https://twitter.com/camenduru/status/1779931341175288028

https://twitter.com/gm8xx8/status/1778234049674227914

YouTube

Show All Videos

HackerNews

InstantMesh: Efficient 3D Mesh Generation from a Single Image (2 points, 0 comments)