CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner (2405.14979v1)
Abstract: We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mesh topologies, noisy surfaces, and difficulties in accommodating user edits, consequently impeding their widespread adoption and implementation in 3D modeling software. Our work is inspired by the craftsman, who usually roughs out the holistic figure of the work first and elaborates the surface details subsequently. Specifically, we employ a 3D native diffusion model, which operates on latent space learned from latent set-based 3D representations, to generate coarse geometries with regular mesh topology in seconds. In particular, this process takes as input a text prompt or a reference image and leverages a powerful multi-view (MV) diffusion model to generate multiple views of the coarse geometry, which are fed into our MV-conditioned 3D diffusion model for generating the 3D geometry, significantly improving robustness and generalizability. Following that, a normal-based geometry refiner is used to significantly enhance the surface details. This refinement can be performed automatically, or interactively with user-supplied edits. Extensive experiments demonstrate that our method achieves high efficacy in producing superior-quality 3D assets compared to existing methods. HomePage: https://craftsman3d.github.io/, Code: https://github.com/wyysf-98/CraftsMan
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021a.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5799–5809, 2021b.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023.
- Learning implicit fields for generative shape modeling. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5939–5948, 2019.
- Diffusion-sdf: Conditional generative modeling of signed distance functions. In International Conference on Computer Vision (ICCV), pages 2262–2272, 2023.
- Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051, 2022.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022.
- Get3d: A generative model of high quality 3d textured shapes learned from images. In Advances In Neural Information Processing Systems, 2022.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Stylenerf: A style-based 3d aware generator for high-resolution image synthesis. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=iUuzzTMUw9K.
- Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
- Neural wavelet-domain diffusion for 3d shape generation. December 2022.
- 3d shape generation with grid-based implicit functions. in 2021 ieee. In CVF Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, 2021.
- Perceiver: General perception with iterative attention. In International Conference on Machine Learning (ICML), pages 4651–4664. PMLR, 2021.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
- Point cloud gan. arXiv preprint arXiv:1810.05795, 2018.
- Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023.
- Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d. International Conference on Learning Representations (ICLR), 2024.
- Luciddreamer: Towards high-fidelity text-to-3d generation via interval score matching, 2023.
- Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. 2024.
- Zero-1-to-3: Zero-shot one image to 3d object, 2023a.
- Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023b.
- Meshdiffusion: Score-based generative 3d mesh modeling. In International Conference on Learning Representations, 2023c. URL https://openreview.net/forum?id=0cpM2ApF9p6.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
- Marching cubes: A high resolution 3d surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, pages 347–353. 1998.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
- Maxime. Quad Remesher. Exoside, 2024.
- Realfusion: 360 reconstruction of any object from a single image. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023. URL https://arxiv.org/abs/2302.10663.
- Occupancy networks: Learning 3d reconstruction in function space. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
- AutoSDF: Shape priors for 3d completion, reconstruction and generation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Laser: Latent Set Representations for 3D Generative Modeling. arXiv, 2022. URL https://laser-nv-paper.github.io/.
- Polygen: An autoregressive generative model of 3d meshes. In International conference on machine learning, pages 7220–7229. PMLR, 2020.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Giraffe: Representing scenes as compositional generative neural feature fields. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 11453–11464, 2021.
- Werner Palfinger. Continuous remeshing for inverse rendering. Computer Animation and Virtual Worlds, 33(5):e2101, 2022.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 165–174, 2019.
- Film: Visual reasoning with a general conditioning layer. volume 32, 2018.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id=0jHkUDyEO9.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
- Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154–20166, 2020.
- Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512, 2023.
- 3d neural field generation using triplane diffusion. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 20875–20886, 2023.
- Pointgrow: Autoregressively learned point cloud generation with self-attention. In Winter Conference on Applications of Computer Vision, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Crm: Single image to 3d textured mesh with convolutional reconstruction model. arXiv preprint arXiv:2403.05034, 2024.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in Neural Information Processing Systems (NeurIPS), 29, 2016.
- Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation. arXiv preprint arXiv:2401.04092, 2024.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
- Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191, 2024.
- Pointflow: 3d point cloud generation with continuous normalizing flows. arXiv, 2019.
- Mosaic-sdf for 3d generative models. arXiv, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. 2023.
- 3DILG: Irregular latent grids for 3d generative modeling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems (NeurIPS), 2022.
- 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics (SIGGRAPH), 42(4), jul 2023a. ISSN 0730-0301. doi: 10.1145/3592442. URL https://doi.org/10.1145/3592442.
- Adding conditional control to text-to-image diffusion models, 2023b.
- Generative multiplane images: Making a 2d gan 3d-aware. In European Conference on Computer Vision, pages 18–35. Springer, 2022.
- Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. In Advances in Neural Information Processing Systems (NeurIPS), 2023. URL https://openreview.net/forum?id=xmxgMij3LY.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021.