Papers
Topics
Authors
Recent
Search
2000 character limit reached

GECCO: Geometrically-Conditioned Point Diffusion Models

Published 10 Mar 2023 in cs.CV | (2303.05916v2)

Abstract: Diffusion models generating images conditionally on text, such as Dall-E 2 and Stable Diffusion, have recently made a splash far beyond the computer vision community. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. This approach improves geometric consistency and yields greater fidelity than current methods relying on unstructured, global latent codes. Additionally, we show how to apply recent continuous-time diffusion schemes. Our method performs on par or above the state of art on conditional and unconditional experiments on synthetic data, while being faster, lighter, and delivering tractable likelihoods. We show it can also scale to diverse indoors scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Learning representations and generative models for 3d point clouds. In International conference on machine learning, pages 40–49. PMLR, 2018.
  2. Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, volume 1611, pages 586–606. Spie, 1992.
  3. JAX: composable transformations of Python+NumPy programs, 2018.
  4. Learning gradient fields for shape generation. In European Conference on Computer Vision, pages 364–381. Springer, 2020.
  5. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  6. Unsupervised learning of fine structure generation for 3d point clouds by 2d projection matching. In Proceedings of the ieee/cvf international conference on computer vision, pages 12466–12477, 2021.
  7. Wavegrad: Estimating gradients for waveform generation. International Conference on Learning Representations, 2021.
  8. Wavegrad 2: Iterative refinement for text-to-speech synthesis. Interspeech, 2021.
  9. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
  10. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In European conference on computer vision, pages 628–644. Springer, 2016.
  11. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
  12. Learning 3d semantic segmentation with only 2d image supervision. In 2021 International Conference on 3D Vision (3DV), pages 361–372. IEEE, 2021.
  13. FFJORD: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018.
  14. A papier-mĂąchĂ© approach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 216–224, 2018.
  15. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  16. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11108–11117, 2020.
  17. Progressive point cloud deconvolution generation network. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pages 397–413. Springer, 2020.
  18. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  19. Mitsuba 3 renderer, 2022. https://mitsuba-renderer.org.
  20. Andrej Karpathy. Mingpt. https://github.com/karpathy/minGPT/, 2020.
  21. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
  22. Learning differential equations that are easy to solve. Advances in Neural Information Processing Systems, 33:4370–4380, 2020.
  23. Patrick Kidger. On Neural Differential Equations. PhD thesis, University of Oxford, 2021.
  24. Equinox: neural networks in JAX via callable PyTrees and filtered transformations. Differentiable Programming workshop at Neural Information Processing Systems 2021, 2021.
  25. Softflow: Probabilistic framework for normalizing flow on manifolds. Advances in Neural Information Processing Systems, 33:16388–16397, 2020.
  26. SetVAE: Learning hierarchical composition for generative modeling of set-structured data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15059–15068, 2021.
  27. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  28. Discrete point flow networks for efficient point cloud generation. In European Conference on Computer Vision, pages 694–710. Springer, 2020.
  29. Diffwave: A versatile diffusion model for audio synthesis. International Conference on Learning Representations, 2021.
  30. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341, 2019.
  31. Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning, pages 3744–3753. PMLR, 2019.
  32. Pix2point: Learning outdoor 3d using sparse point clouds and optimal transport. In 2021 17th International Conference on Machine Vision and Applications (MVA), pages 1–5. IEEE, 2021.
  33. Point cloud gan. arXiv preprint arXiv:1810.05795, 2018.
  34. SP-GAN: sphere-guided 3d shape generation and manipulation. ACM Transactions on Graphics (TOG), 40(4):1–12, 2021.
  35. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
  36. Point-voxel CNN for efficient 3d deep learning. Advances in Neural Information Processing Systems, 32, 2019.
  37. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  38. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  39. A conditional point diffusion-refinement paradigm for 3d point cloud completion. arXiv preprint arXiv:2112.03530, 2021.
  40. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
  41. Deep learning for monocular depth estimation: A review. Neurocomputing, 438:14–33, 2021.
  42. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  43. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021.
  44. Go with the flows: Mixtures of normalizing flows for point cloud generation and reconstruction. In 2021 International Conference on 3D Vision (3DV), pages 1249–1258. IEEE, 2021.
  45. C-flow: Conditional generative flow models for images and 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7949–7958, 2020.
  46. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  47. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  48. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in Neural Information Processing Systems, 2022.
  49. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  50. Beyond periodicity: Towards a unifying framework for activations in coordinate-MLPs. In European Conference on Computer Vision, pages 142–158. Springer, 2022.
  51. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  52. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  53. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  54. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022.
  55. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18603–18613, 2022.
  56. Learning to navigate using mid-level visual priors. In Conference on Robot Learning, pages 791–812. PMLR, 2020.
  57. 3d point cloud generative adversarial network based on tree structured graph convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3859–3868, 2019.
  58. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  59. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  60. Acne: Attentive context normalization for robust permutation-equivariant learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11286–11295, 2020.
  61. Pointgrow: Autoregressively learned point cloud generation with self-attention. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 61–70, 2020.
  62. Pixel2Mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018.
  63. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
  64. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  65. Generative PointNet: Deep energy-based learning on unordered point sets for 3d generation, reconstruction and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14976–14985, 2021.
  66. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4541–4550, 2019.
  67. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3712–3722, 2018.
  68. LION: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems, 2022.
  69. Learning to generate 3d shapes with generative cellular automata. arXiv preprint arXiv:2103.04130, 2021.
  70. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16259–16268, 2021.
  71. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5826–5835, 2021.
  72. Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9939–9948, 2021.
  73. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Advances in neural information processing systems, 33:18795–18806, 2020.
Citations (15)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.