Papers
Topics
Authors
Recent
Search
2000 character limit reached

InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions

Published 12 Jun 2023 in cs.CV | (2306.07154v1)

Abstract: Enhancing AI systems to perform tasks following human instructions can significantly boost productivity. In this paper, we present InstructP2P, an end-to-end framework for 3D shape editing on point clouds, guided by high-level textual instructions. InstructP2P extends the capabilities of existing methods by synergizing the strengths of a text-conditioned point cloud diffusion model, Point-E, and powerful LLMs, enabling color and geometry editing using language instructions. To train InstructP2P, we introduce a new shape editing dataset, constructed by integrating a shape segmentation dataset, off-the-shelf shape programs, and diverse edit instructions generated by a LLM, ChatGPT. Our proposed method allows for editing both color and geometry of specific regions in a single forward pass, while leaving other regions unaffected. In our experiments, InstructP2P shows generalization capabilities, adapting to novel shape categories and instructions, despite being trained on a limited amount of data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Learning representations and generative models for 3d point clouds. In International conference on machine learning, pages 40–49. PMLR, 2018.
  2. ChangeIt3D: Language-assisted 3d shape edits and deformations. https://changeit3d.github.io/, 2022.
  3. Text2live: Text-driven layered image and video editing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, pages 707–723. Springer, 2022.
  4. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992. doi: 10.1109/34.121791.
  5. On linear variational surface deformation methods. IEEE transactions on visualization and computer graphics, 14(1):213–230, 2007.
  6. Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Recursively generated b-spline surfaces on arbitrary topological meshes. Computer-aided design, 10(6):350–355, 1978.
  9. Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  10. SDFusion: Multimodal 3d shape completion, reconstruction, and generation. arXiv, 2022.
  11. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  12. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018. URL http://www.blender.org.
  13. Diffedit: Diffusion-based semantic image editing with mask guidance. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=3lge0p5o-M-.
  14. Vqgan-clip: Open domain image generation and editing with natural language guidance. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pages 88–105. Springer, 2022.
  15. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
  16. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion, 2023.
  17. Textdeformer: Geometry manipulation using text guidance. In ACM Transactions on Graphics (SIGGRAPH), 2023.
  18. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  19. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  20. Dualsdf: Semantic shape manipulation using a two-level representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7631–7641, 2020.
  21. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  22. Prompt-to-prompt image editing with cross attention control. 2022.
  23. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  24. Neural wavelet-domain diffusion for 3d shape generation, inversion, and manipulation. arXiv preprint arXiv:2302.00190, 2023.
  25. Harmonic coordinates for character articulation. ACM transactions on graphics (TOG), 26(3):71–es, 2007.
  26. Mean value coordinates for closed triangular meshes. ACM Trans. Graph., 24(3):561–566, jul 2005. ISSN 0730-0301. doi: 10.1145/1073204.1073229. URL https://doi.org/10.1145/1073204.1073229.
  27. Instruct 3d-to-3d: Text instruction guided 3d-to-3d conversion. arXiv preprint arXiv:2303.15780, 2023.
  28. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  29. Imagic: Text-based real image editing with diffusion models. In Conference on Computer Vision and Pattern Recognition 2023, 2023.
  30. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2426–2435, June 2022.
  31. Shape-aware text-driven layered video editing. arXiv e-prints, pages arXiv–2301, 2023.
  32. Diffusion-sdf: Text-to-shape via voxelized diffusion. arXiv preprint arXiv:2212.03293, 2022.
  33. 3dqd: Generalized deep 3d shape prior via part-discretized diffusion process. arXiv preprint arXiv:2303.10406, 2023.
  34. Deepmetahandles: Learning deformation meta-handles of 3d meshes with biharmonic coordinates. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12–21, 2021.
  35. Meshdiffusion: Score-based generative 3d mesh modeling. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=0cpM2ApF9p6.
  36. Charles Loop. Smooth subdivision surfaces based on triangles. Master’s thesis, Department of Mathematics, University of Utah, 1987.
  37. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2837–2845, June 2021.
  38. Deltaedit: Exploring text-free training for text-driven image manipulation. arXiv preprint arXiv:2303.06285, 2023a.
  39. Controllable mesh generation through sparse latent point diffusion models. arXiv preprint arXiv:2303.07938, 2023b.
  40. X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance, 2023.
  41. Pc2: Projection-conditioned point cloud diffusion for single-image 3d reconstruction. In Arxiv, 2023.
  42. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=aBsCjcPu_tE.
  43. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
  44. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13492–13502, June 2022.
  45. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  46. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  47. Null-text inversion for editing real images using guided diffusion models. arXiv preprint arXiv:2211.09794, 2022.
  48. Difffacto controllable part-based 3d point cloud generation with cross diffusion. arXiv preprint arXiv:2305.01921, 2023.
  49. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  50. OpenAI. Introducing chatgpt, 2022. URL https://openai.com/blog/chatgpt.
  51. OpenAI. Gpt-4 technical report, 2023.
  52. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  53. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
  54. Geocode: Interpretable shape programs. arXiv preprint arXiv:2212.11715, 2022.
  55. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  56. Improving language understanding by generative pre-training. 2018.
  57. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  58. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  59. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  60. 3d neural field generation using triplane diffusion. arXiv preprint arXiv:2211.16677, 2022.
  61. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=St1giarCHLP.
  62. Laplacian surface editing. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pages 175–184, 2004.
  63. Neural shape deformation priors. In Advances in Neural Information Processing Systems, 2022.
  64. Gecco: Geometrically-conditioned point diffusion models. arXiv preprint arXiv:2303.05916, 2023.
  65. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
  66. Inst-inpaint: Instructing to remove objects with diffusion models. arXiv preprint arXiv:2304.03246, 2023.
  67. Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  68. Sine: Single image editing with text-to-image diffusion models. arXiv preprint arXiv:2212.04489, 2022.
  69. Bridging clip and stylegan through latent alignment for image editing. arXiv preprint arXiv:2210.04506, 2022.
  70. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5826–5835, October 2021.
Citations (9)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.