Interactive3D: Create What You Want by Interactive 3D Generation (2404.16510v1)
Abstract: 3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation. Our project webpage is available at \url{https://interactive-3d.github.io/}.
- Learning representations and generative models for 3d point clouds. In ICML, 2018.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In ICCV, 2023a.
- Learning implicit fields for generative shape modeling. In CVPR, 2019.
- Text-to-3d using gaussian splatting. arXiv preprint arXiv:2309.16585, 2023b.
- 3d shape induction from 2d views of multiple objects. In 3DV, 2017.
- Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022.
- Escaping plato’s cave: 3d shape from adversarial rendering. In ICCV, 2019.
- Denoising diffusion probabilistic models. NeurIPS, 2020.
- Zero-shot text-guided object generation with dream fields. In CVPR, 2022.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
- Segment anything. In ICCV, 2023.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. NeurIPS, 2024.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023.
- Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In ECCV, 2022.
- Inverse graphics gan: Learning to generate 3d shapes from unstructured 2d data. arXiv preprint arXiv:2002.12674, 2020.
- Scalable 3d captioning with pretrained models. NeurIPS, 2024.
- Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Structurenet: Hierarchical graph networks for 3d shape generation. arXiv preprint arXiv:1908.00575, 2019.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4):102:1–102:15, 2022.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 Conference Proceedings, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. NeurIPS, 2021.
- Improved adversarial systems for 3d object generation and reconstruction. In CoRL, 2017.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. NeurIPS, 2024.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. NeurIPS, 2016.
- Pointflow: 3d point cloud generation with continuous normalizing flows. In ICCV, 2019.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. arXiv preprint arXiv:2010.09125, 2020.