Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models (2311.17050v3)
Abstract: We present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Previous methods explored shape generation with different representations and they suffer from limited topologies and poor geometry details. To generate high-quality surfaces of arbitrary topologies, we use the Unsigned Distance Field (UDF) as our surface representation to accommodate arbitrary topologies. Furthermore, we propose a new pipeline that employs a point-based AutoEncoder to learn a compact and continuous latent space for accurately encoding UDF and support high-resolution mesh extraction. We further show that our new pipeline significantly outperforms the prior approaches to learning the distance fields, such as the grid-based AutoEncoder, which is not scalable and incapable of learning accurate UDF. In addition, we adopt a curriculum learning strategy to efficiently embed various surfaces. With the pretrained shape latent space, we employ a latent diffusion model to acquire the distribution of various shapes. Extensive experiments are presented on using Surf-D for unconditional generation, category conditional generation, image conditional generation, and text-to-shape tasks. The experiments demonstrate the superior performance of Surf-D in shape generation across multiple modalities as conditions. Visit our project page at https://yzmblog.github.io/projects/SurfD/.
- Learning representations and generative models for 3d point clouds. In ICML, 2018.
- Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
- John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6):679–698, 1986.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In CVPR, 2021.
- Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022.
- ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023a.
- Text2shape: Generating shapes from natural language by learning joint embeddings. In ACCV, 2018.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023b.
- 3psdf: Three-pole signed distance function for learning surfaces with arbitrary topologies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18522–18531, 2022.
- Learning implicit fields for generative shape modeling. In CVPR, 2019.
- SDFusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, 2023.
- Neural unsigned distance fields for implicit function learning. Advances in Neural Information Processing Systems, 33:21638–21652, 2020.
- Diffusion-sdf: Conditional generative modeling of signed distance functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Drapenet: Garment generation and self-supervised draping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1451–1460, 2023.
- Modulating early visual processing by language. Advances in Neural Information Processing Systems, 30, 2017.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Curriculum deepsdf. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 51–67. Springer, 2020.
- Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015, 2023.
- Shapecrafter: A recursive text-conditioned 3d shape generation model. Advances in Neural Information Processing Systems, 35:8882–8895, 2022.
- Preserve your own correlation: A noise prior for video diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22930–22941, 2023.
- Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. In ICLR, 2022.
- Meshudf: Fast and differentiable meshing of unsigned distance field networks. In European Conference on Computer Vision, pages 576–592. Springer, 2022.
- Latent video diffusion models for high-fidelity video generation with arbitrary lengths. arXiv preprint arXiv:2211.13221, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Neural wavelet-domain diffusion for 3d shape generation. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
- Gmd: Controllable human motion synthesis via guided diffusion models. arXiv preprint arXiv:2305.12577, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Salad: Part-level latent diffusion for 3d shape generation and manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14441–14451, 2023.
- Diffusion-sdf: Text-to-shape via voxelized diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12642–12651, 2023a.
- Generalized deep 3d shape prior via part-discretized diffusion process. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9298–9309, 2023b.
- Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023c.
- Towards implicit text-guided 3d shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17896–17906, 2022.
- Dreamstone: Image as stepping stone for text-guided 3d shape generation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023d.
- Meshdiffusion: Score-based generative 3d mesh modeling. In The Eleventh International Conference on Learning Representations, 2023e.
- Ghost on the shell: An expressive representation of general 3d shapes, 2023f.
- Exim: A hybrid explicit-implicit representation for text-guided 3d shape generation. arXiv preprint arXiv:2311.01714, 2023g.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- Controllable mesh generation through sparse latent point diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 271–280, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- AutoSDF: Shape priors for 3d completion, reconstruction and generation. In CVPR, 2022.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
- Shape as points: A differentiable poisson solver. Advances in Neural Information Processing Systems, 34:13032–13044, 2021.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
- VPP: Efficient universal 3d generation via voxel-point progressive representation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022a.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022b.
- Clip-sculptor: Zero-shot generation of high-fidelity and diverse shapes from natural language. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18339–18348, 2023.
- Controllable motion diffusion model. arXiv preprint arXiv:2306.00416, 2023.
- Diffusion-based signed distance fields for 3d shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Improved adversarial systems for 3d object generation and reconstruction. 2017.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2015.
- Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
- Neuralrecon: Real-time coherent 3d reconstruction from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15598–15607, 2021.
- Pix3d: Dataset and methods for single-image 3d shape modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2974–2983, 2018.
- Human motion diffusion model. In The Eleventh International Conference on Learning Representations, 2022.
- Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems, 35:10021–10039, 2022.
- Mcvd-masked conditional video diffusion for prediction, generation, and interpolation. Advances in Neural Information Processing Systems, 35:23371–23385, 2022.
- Phil Wang. Dalle2-pytorch, 2022.
- Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38(5):1–12, 2019.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
- Sin3dm: Learning a diffusion model from a single 3d textured shape. arXiv preprint arXiv:2305.15399, 2023a.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
- Sketch and text guided diffusion model for colored point cloud generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023b.
- Learning descriptor networks for 3d shape synthesis and analysis. In CVPR, 2018.
- Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. In NeurIPS, 2019.
- Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
- Learning anchored unsigned distance functions with gradient direction alignment for single-view garment reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12674–12683, 2021.
- Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. arXiv preprint arXiv:2306.17115, 2023.
- Locally attentional sdf diffusion for controllable 3d shape generation. ACM Transactions on Graphics (SIGGRAPH), 42(4), 2023a.
- Locally attentional sdf diffusion for controllable 3d shape generation. arXiv preprint arXiv:2305.04461, 2023b.
- Neural volumetric mesh generator. In NeurIPS 2022 Workshop on Score-Based Methods, 2022.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5826–5835, 2021.
- Deep fashion3d: A dataset and benchmark for 3d garment reconstruction from single images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 512–530. Springer, 2020.