Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation (2305.14335v1)
Abstract: In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively. Code is available at https://github.com/heshuting555/PAP-FZS3D.
- L. Landrieu and M. Simonovsky, “Large-scale point cloud semantic segmentation with superpoint graphs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4558–4567.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660.
- Q. Huang, W. Wang, and U. Neumann, “Recurrent slice networks for 3d segmentation of point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2626–2635.
- Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” in Proc. Adv. Neural Inform. Process. Syst., 2018, pp. 820–830.
- X. Li, H. Ding, Z. Tong, Y. Wu, and Y. M. Chee, “Primitive3d: 3d object dataset synthesis from randomly assembled primitives,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 15 947–15 957.
- Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Transactions on Graphics (TOG), vol. 38, no. 5, pp. 1–12, 2019.
- X. Li, H. Ding, W. Zhang, H. Yuan, J. Pang, G. Cheng, K. Chen, Z. Liu, and C. C. Loy, “Transformer-based visual segmentation: A survey,” arXiv preprint arXiv:2304.09854, 2023.
- N. Zhao, T.-S. Chua, and G. H. Lee, “Few-shot 3d point cloud semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8873–8882.
- K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “Panet: Few-shot image semantic segmentation with prototype alignment,” in International Conference on Computer Vision. IEEE, 2019, pp. 9196–9205.
- Y. Liu, X. Zhang, S. Zhang, and X. He, “Part-aware prototype network for few-shot semantic segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham: Springer International Publishing, 2020, pp. 142–158.
- W. Liu, C. Zhang, H. Ding, T.-Y. Hung, and G. Lin, “Few-shot segmentation with optimal transport matching and message flow,” IEEE Trans. Multimedia, 2022.
- B. Yang, C. Liu, B. Li, J. Jiao, and Q. Ye, “Prototype mixture models for few-shot semantic segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham: Springer International Publishing, 2020, pp. 763–778.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
- B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9224–9232.
- B. Graham and L. van der Maaten, “Submanifold sparse convolutional networks,” arXiv preprint arXiv:1706.01307, 2017.
- C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3075–3084.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. Adv. Neural Inform. Process. Syst., 2017, pp. 5099–5108.
- W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in CVPR, 2019, pp. 9621–9630.
- H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in ICCV, 2021, pp. 16 259–16 268.
- X. Lai, J. Liu, L. Jiang, L. Wang, H. Zhao, S. Liu, X. Qi, and J. Jia, “Stratified transformer for 3d point cloud segmentation,” in CVPR, 2022, pp. 8500–8509.
- T. Vu, K. Kim, T. M. Luu, X. T. Nguyen, and C. D. Yoo, “Softgroup for 3d instance segmentation on point clouds,” in CVPR, 2022, pp. 2708–2717.
- J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Proc. Adv. Neural Inform. Process. Syst., 2017, pp. 4077–4087.
- O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching networks for one shot learning,” in Proc. Adv. Neural Inform. Process. Syst., 2016, pp. 3630–3638.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
- C. Zhang, Y. Cai, G. Lin, and C. Shen, “Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12 203–12 213.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1126–1135.
- A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, “Meta-learning with latent embedding optimization,” arXiv preprint arXiv:1807.05960, 2018.
- Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei, “Memory matching networks for one-shot image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4080–4088.
- C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, 2009, pp. 951–958.
- S. He, H. Ding, and W. Jiang, “Semantic-promoted debiasing and background disambiguation for zero-shot instance segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 11 238–11 247.
- H. Zhang and H. Ding, “Prototypical matching and open set rejection for zero-shot semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6974–6983.
- S. He, H. Ding, and W. Jiang, “Primitive generation and semantic-related alignment for universal zero-shot segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 11 238–11 247.
- B. Demirel, R. Gokberk Cinbis, and N. Ikizler-Cinbis, “Attributes2classname: A discriminative model for attribute-based unsupervised zero-shot learning,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1232–1241.
- Y. Li, Z. Jia, J. Zhang, K. Huang, and T. Tan, “Deep semantic structural constraints for zero-shot learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, pp. 7049–7056.
- C. Gan, M. Lin, Y. Yang, Y. Zhuang, and A. G. Hauptmann, “Exploring semantic inter-class relationships (sir) for zero-shot action recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2015, pp. 3769–3775.
- Z. Zhang and V. Saligrama, “Zero-shot learning via joint latent similarity embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 6034–6042.
- F. X. Yu, L. Cao, R. S. Feris, J. R. Smith, and S.-F. Chang, “Designing category-level attributes for discriminative visual recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 771–778.
- H. Ding, X. Jiang, B. Shuai, A. Q. Liu, and G. Wang, “Context contrasted feature and gated multi-scale aggregation for scene segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2393–2402.
- B. Shuai, H. Ding, T. Liu, G. Wang, and X. Jiang, “Toward achieving robust low-level and high-level scene parsing,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1378–1390, 2018.
- H. Ding, X. Jiang, B. Shuai, A. Q. Liu, and G. Wang, “Semantic correlation promoted shape-variant context for segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8885–8894.
- H. Ding, X. Jiang, A. Q. Liu, N. M. Thalmann, and G. Wang, “Boundary-aware feature propagation for scene segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6819–6829.
- H. Ding, H. Zhang, and X. Jiang, “Self-regularized prototypical network for few-shot semantic segmentation,” Pattern Recognition, vol. 133, p. 109018, 2023.
- W. Liu, Z. Wu, H. Ding, F. Liu, J. Lin, and G. Lin, “Few-shot segmentation with global and local contrastive learning,” arXiv preprint arXiv:2108.05293, 2021.
- A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in British Machine Vision Conference. BMVA Press, 2017.
- G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2. Lille, 2015.
- N. Dong and E. Xing, “Few-shot semantic segmentation with prototype learning.” in Proceedings of the British Machine Vision Conference, Newcastle, UK. BMVA Press, 2018, p. 79.
- X. Zhang, Y. Wei, Y. Yang, and T. S. Huang, “Sg-one: Similarity guidance network for one-shot semantic segmentation,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3855–3865, 2020.
- C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5217–5226.
- Z. Tian, H. Zhao, M. Shu, Z. Yang, R. Li, and J. Jia, “Prior guided feature enrichment network for few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 1050–1065, 2022.
- X. Chu, W. Ouyang, H. Li, and X. Wang, “Structured feature learning for pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4715–4723.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in ICLR, 2013.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. PMLR, 2021, pp. 8748–8763.
- H. Ding, C. Liu, S. Wang, and X. Jiang, “Vision-language transformer and query generation for referring segmentation,” in ICCV, 2021, pp. 16 321–16 330.
- H. Ding, S. Cohen, B. Price, and X. Jiang, “Phraseclick: toward achieving flexible interactive segmentation by phrase and click,” in ECCV. Springer, 2020, pp. 417–435.
- H. Ding, C. Liu, S. Wang, and X. Jiang, “VLT: vision-language transformer and query generation for referring segmentation,” IEEE TPAMI, vol. 45, no. 6, pp. 7900–7916, 2023.
- Y. Li, K. Swersky, and R. Zemel, “Generative moment matching networks,” in Proc. Int. Conf. Mach. Learn. PMLR, 2015, pp. 1718–1727.
- I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1534–1543.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5828–5839.
- Z. Lu, S. He, X. Zhu, L. Zhang, Y.-Z. Song, and T. Xiang, “Simpler is better: Few-shot semantic segmentation with classifier weight transformer,” in ICCV, 2021, pp. 8741–8750.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV. Cham: Springer International Publishing, 2020, pp. 213–229.
- V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,” in 6th International Conference on Learning Representations, Vancouver, BC, Canada, Conference Track Proceedings. OpenReview.net, 2018.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2117–2125.
- H. Ding, X. Jiang, B. Shuai, A. Q. Liu, and G. Wang, “Semantic segmentation with context encoding and multi-path decoding,” IEEE Transactions on Image Processing, vol. 29, pp. 3520–3533, 2020.
- R. Zhang, Z. Guo, W. Zhang, K. Li, X. Miao, B. Cui, Y. Qiao, P. Gao, and H. Li, “Pointclip: Point cloud understanding by clip,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8552–8562.
- B. Michele, A. Boulch, G. Puy, M. Bucher, and R. Marlet, “Generative zero-shot learning for semantic segmentation of 3d point clouds,” in 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 992–1002.
- Shuting He (23 papers)
- Xudong Jiang (69 papers)
- Wei Jiang (341 papers)
- Henghui Ding (87 papers)