Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding (2401.07572v1)
Abstract: In this study, we tackle the challenge of classifying the object category in point clouds, which previous works like PointCLIP struggle to address due to the inherent limitations of the CLIP architecture. Our approach leverages GPT-4 Vision (GPT-4V) to overcome these challenges by employing its advanced generative abilities, enabling a more adaptive and robust classification process. We adapt the application of GPT-4V to process complex 3D data, enabling it to achieve zero-shot recognition capabilities without altering the underlying model architecture. Our methodology also includes a systematic strategy for point cloud image visualization, mitigating domain gap and enhancing GPT-4V's efficiency. Experimental validation demonstrates our approach's superiority in diverse scenarios, setting a new benchmark in zero-shot point cloud classification.
- Zero-shot learning on 3D point cloud objects and beyond. IJCV, 2022.
- Revisiting point cloud shape classification with a simple and effective baseline. In ICML, 2021.
- CLIP2point: Transfer CLIP to point cloud classification with image-depth pre-training. In ICCV, 2023.
- Relation-shape convolutional neural network for point cloud analysis. In CVPR, 2019.
- OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- PointNet: Deep learning on point sets for 3D classification and segmentation. In CVPR, 2017a.
- PointNet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS, 2017b.
- Contrast with reconstruct: Contrastive 3D representation learning guided by generative pretraining. In ICML, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- A network architecture for point cloud classification via automatic depth images generation. In CVPR, 2018.
- Learning 3D shapes as multi-layered height-maps using 2D convolutional networks. In ECCV, 2018.
- Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In ICCV, 2019.
- 3D ShapeNets: A deep representation for volumetric shapes. In CVPR, 2015.
- Point-BERT: Pre-training 3D point cloud transformers with masked point modeling. In CVPR, 2022.
- Uni3D: A unified baseline for multi-dataset 3D object detection. In CVPR, 2023.
- PointCLIP: Point cloud understanding by CLIP. In CVPR, 2022.
- PointCLIP V2: Adapting CLIP for powerful 3D open-world learning. In ICCV, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.