Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explore In-Context Learning for 3D Point Cloud Understanding (2306.08659v2)

Published 14 Jun 2023 in cs.CV

Abstract: With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Flamingo: a visual language model for few-shot learning. In NeurIPS, 2022.
  2. Towards in-context scene understanding. arXiv:2306.01667, 2023.
  3. Beit: Bert pre-training of image transformers. In ICLR, 2022.
  4. Visual prompting via image inpainting. In NeurIPS, 2022.
  5. On the opportunities and risks of foundation models. arXiv:2108.07258, 2021.
  6. Language models are few-shot learners. In NeurIPS, 2020.
  7. Shapenet: An information-rich 3d model repository. arXiv:1512.03012, 2015.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  9. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? In ICLR, 2023.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. A point set generation network for 3d object reconstruction from a single image. In CVPR, 2017.
  12. Pct: Point cloud transformer. In CVM, 2021.
  13. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  14. Randla-net: Efficient semantic segmentation of large-scale point clouds. In CVPR, 2020.
  15. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  16. Hierarchical point-edge interaction network for point cloud semantic segmentation. In ICCV, 2019.
  17. Prefix-tuning: Optimizing continuous prompts for generation. arXiv:2101.00190, 2021.
  18. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv:2110.07602, 2021.
  19. Regress before construct: Regress autoencoder for point cloud self-supervised learning. In ACM MM, 2023.
  20. Swin transformer v2: Scaling up capacity and resolution. In CVPR, 2022.
  21. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  22. Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In ICCV, 2019.
  23. Fixing weight decay regularization in adam. In ICLR, 2018.
  24. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In ICLR, 2022.
  25. Parameter-efficient image-to-video transfer learning. arXiv:2206.13559, 2022.
  26. Masked autoencoders for point cloud self-supervised learning. In ECCV, 2022.
  27. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
  28. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 2017.
  29. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. In ICML, 2023.
  30. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. In NeurIPS, 2022.
  31. Improving standard transformer models for 3d point cloud understanding with image pretraining. arXiv:2208.12259, 2022.
  32. Learning transferable visual models from natural language supervision. In ICML, 2021.
  33. Zero-shot text-to-image generation. In ICML, 2021.
  34. Learning to retrieve prompts for in-context learning. arXiv:2112.08633, 2021.
  35. Exploring effective factors for improving visual in-context learning. arXiv:2304.04748, 2023.
  36. Attention is all you need. In NeurIPS, 2017.
  37. Images speak in images: A generalist painter for in-context visual learning. In CVPR, 2023.
  38. Seggpt: Segmenting everything in context. In ICCV, 2023.
  39. Dynamic graph cnn for learning on point clouds. In TOG, 2019.
  40. 3d shapenets: A deep representation for volumetric shapes. In CVPR, 2015.
  41. A scalable active framework for region annotation in 3d shape collections. In TOG, 2016.
  42. Pointr: Diverse point cloud completion with geometry-aware transformers. In ICCV, 2021.
  43. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In CVPR, 2022.
  44. Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training. In NeurIPS, 2022.
  45. Pointclip: Point cloud understanding by clip. In CVPR, 2022.
  46. Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. In CVPR, 2023.
  47. What makes good examples for visual in-context learning? arXiv:2301.13670, 2023.
  48. Point transformer. In ICCV, 2021.
  49. Learning to prompt for vision-language models. In IJCV, 2022.
Citations (17)

Summary

We haven't generated a summary for this paper yet.