Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges (2312.07039v2)

Published 12 Dec 2023 in cs.CV

Abstract: With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification, and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Language models are few-shot learners. In NeurIPS, pages 1877–1901, 2020.
  2. Mutex parts collaborative network for 3d point cloud zero-shot classification. SSRN.
  3. Viewnet: A novel projection-based backbone with view pooling for few-shot point cloud classification. In CVPR, pages 17652–17660, 2023.
  4. Mitigating the hubness problem for zero-shot learning of 3d objects. In BMVC, pages 41–53, 2019a.
  5. Zero-shot learning of 3d point cloud objects. In MVA, pages 1–6, 2019b.
  6. Transductive zero-shot learning for 3d point cloud classification. In WACV, pages 923–933, 2020.
  7. Zero-shot learning on 3d point cloud objects and beyond. IJCV, 130(10):2364–2384, 2022.
  8. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR, pages 3075–3084, 2019.
  9. Text-to-image diffusion models are zero-shot classifiers. arXiv preprint arXiv:2303.15233, 2023.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. Deep learning for 3d point clouds: A survey. IEEE TPAMI, 43(12):4338–4364, 2020.
  12. Semantic contrastive embedding for generalized zero-shot learning. IJCV, 130(11):2606–2622, 2022.
  13. Contrastive generative network with recursive-loop for 3d point cloud generalized zero-shot classification. PR, 2023.
  14. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  15. Clip goes 3d: Leveraging prompt tuning for language grounded 3d recognition. In ICCV, pages 2028–2038, 2023.
  16. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  17. Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In ICCV, pages 22157–22167, 2023.
  18. Imagenet classification with deep convolutional neural networks. NeurIPS, 25, 2012.
  19. Zero-data learning of new tasks. In AAAI, pages 646–651, 2008.
  20. Your diffusion model is secretly a zero-shot classifier. In ICCV, 2023.
  21. Voxnet: A 3d convolutional neural network for real-time object recognition. In IROS, pages 922–928, 2015.
  22. Generative zero-shot learning for semantic segmentation of 3d point clouds. In 3DV, pages 992–1002, 2021.
  23. Distributed representations of words and phrases and their compositionality. NeurIPS, 26, 2013.
  24. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR, 2019.
  25. 3d compositional zero-shot learning with decompositional consensus. In ECCV, pages 713–730, 2022.
  26. I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification. In CVPR, pages 15169–15179, 2023.
  27. Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, pages 479–495, 2020.
  28. Dilf: Differentiable rendering-based multi-view image–language fusion for zero-shot 3d shape understanding. Information Fusion, 102:102033, 2024.
  29. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014.
  30. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pages 652–660, 2017.
  31. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. In ICML, 2023.
  32. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  33. Zero-shot object detection: joint recognition and localization of novel concepts. IJCV, 128(12):2979–2999, 2020.
  34. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  35. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  36. Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR, pages 8247–8255, 2019.
  37. Retrieving articulated 3-d models using medial surfaces. MVA, 19:261–275, 2008.
  38. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
  39. Multi-view convolutional neural networks for 3d shape recognition. In ICCV, pages 945–953, 2015.
  40. A survey of zero-shot learning: Settings, methods, and applications. TIST, 10(2):1–37, 2019.
  41. Transferring clip’s knowledge into zero-shot point cloud semantic segmentation. In ACM MM, pages 3745–3754, 2023.
  42. Point transformer v2: Grouped vector attention and partition-based pooling. NeurIPS, 35:33330–33342, 2022.
  43. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015.
  44. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE TPAMI, 41(9):2251–2265, 2018.
  45. Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding. In CVPR, pages 1179–1189, 2023a.
  46. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. arXiv preprint arXiv:2305.08275, 2023b.
  47. Semantics-guided intra-category knowledge transfer for generalized zero-shot learning. IJCV, 131(6):1331–1345, 2023.
  48. A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation. AIMS-ACI, 2(1):1–31, 2022.
  49. Learning relationships for multi-view 3d object recognition. In ICCV, pages 7505–7514, 2019.
  50. Disentangling semantic-to-visual confusion for zero-shot learning. IEEE TMM, 24:2828–2840, 2021.
  51. Rebalanced zero-shot learning. IEEE TIP, 2023.
  52. Pointclip: Point cloud understanding by CLIP. In CVPR, pages 8542–8552, 2022.
  53. Point transformer. In CVPR, pages 16259–16268, 2021.
  54. Information bottleneck and selective noise supervision for zero-shot learning. ML, pages 1–23, 2022.
  55. Pointclip v2: Adapting clip for powerful 3d open-world learning. In ICCV, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Weiguang Zhao (10 papers)
  2. Guanyu Yang (32 papers)
  3. Chaolong Yang (5 papers)
  4. Chenru Jiang (5 papers)
  5. Yuyao Yan (12 papers)
  6. Rui Zhang (1138 papers)
  7. Kaizhu Huang (95 papers)
  8. Amir Hussain (75 papers)
Github Logo Streamline Icon: https://streamlinehq.com