Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis (2403.01439v2)
Abstract: Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. However, existing methods for model adaptation usually update all model parameters, i.e., full fine-tuning paradigm, which is inefficient as it relies on high computational costs (e.g., training GPU memory) and massive storage space. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency. To achieve this goal, we freeze the parameters of the default pre-trained models and then propose the Dynamic Adapter, which generates a dynamic scale for each token, considering the token significance to the downstream task. We further seamlessly integrate Dynamic Adapter with Prompt Tuning (DAPT) by constructing Internal Prompts, capturing the instance-specific features for interaction. Extensive experiments conducted on five challenging datasets demonstrate that the proposed DAPT achieves superior performance compared to the full fine-tuning counterparts while significantly reducing the trainable parameters and training GPU memory by 95% and 35%, respectively. Code is available at https://github.com/LMD0311/DAPT.
- Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Rethinking point cloud registration as masking and reconstruction. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023a.
- Adaptformer: Adapting vision transformers for scalable visual recognition. In Proc. of Advances in Neural Information Processing Systems, 2022.
- Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
- Voxelnext: Fully sparse voxelnet for 3d object detection and tracking. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023b.
- Pra-net: Point relation-aware network for 3d point cloud analysis. IEEE Transactions on Image Processing, 2021.
- Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? In Proc. of Intl. Conf. on Learning Representations, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. of Intl. Conf. on Learning Representations, 2021.
- A point set generation network for 3d object reconstruction from a single image. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2017.
- Mvtn: Multi-view transformation network for 3d shape recognition. In Porc. of IEEE Intl. Conf. on Computer Vision, 2021.
- Towards a unified view of parameter-efficient transfer learning. In Proc. of Intl. Conf. on Learning Representations, 2021.
- Masked autoencoders are scalable vision learners. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Attention discriminant sampling for point clouds. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
- Query-based temporal fusion with explicit motion for 3d object detection. In Proc. of Advances in Neural Information Processing Systems, 2024.
- Parameter-efficient transfer learning for nlp. In Proc. of Intl. Conf. on Machine Learning, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Visual prompt tuning. In Proc. of European Conference on Computer Vision, 2022.
- Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proc. of the AAAI Conf. on Artificial Intelligence, 2023.
- Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
- Compacter: Efficient low-rank hypercomplex adapter layers. In Proc. of Advances in Neural Information Processing Systems, 2021.
- Stratified transformer for 3d point cloud segmentation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- The power of scale for parameter-efficient prompt tuning. In Proc. of Conf. on Empirical Methods in Natural Language Processing, 2021.
- Pillarnext: Rethinking network designs for 3d object detection in lidar point clouds. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
- Prefix-tuning: Optimizing continuous prompts for generation. In Annual Meeting of the Association for Computational Linguistics, 2021.
- Scaling & shifting your features: A new baseline for efficient model tuning. In Proc. of Advances in Neural Information Processing Systems, 2022.
- Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024.
- Masked discrimination for self-supervised learning on point clouds. In Proc. of European Conference on Computer Vision, 2022a.
- Relation-shape convolutional neural network for point cloud analysis. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2019.
- Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
- Sgdr: Stochastic gradient descent with warm restarts. In Proc. of Intl. Conf. on Learning Representations, 2017.
- Decoupled weight decay regularization. In Proc. of Intl. Conf. on Learning Representations, 2019.
- Cheap and quick: Efficient vision-language instruction tuning for large language models. In Proc. of Advances in Neural Information Processing Systems, 2023.
- Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In Proc. of Intl. Conf. on Learning Representations, 2022.
- Unipelt: A unified framework for parameter-efficient language model tuning. In Annual Meeting of the Association for Computational Linguistics, 2022.
- Autosdf: Shape priors for 3d completion, reconstruction and generation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Masked autoencoders for point cloud self-supervised learning. In Proc. of European Conference on Computer Vision, 2022.
- Self-supervised learning of point clouds via orientation estimation. In Intl. Conf. on 3D Vision. IEEE, 2020.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. of Advances in Neural Information Processing Systems, 2017b.
- Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. In Proc. of Intl. Conf. on Machine Learning, 2023.
- Pointnext: Revisiting pointnet++ with improved training and scaling strategies. In Proc. of Advances in Neural Information Processing Systems, 2022.
- Surface representation for point clouds. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Porc. of IEEE Intl. Conf. on Computer Vision, 2019.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.
- Unsupervised point cloud pre-training via occlusion completion. In Porc. of IEEE Intl. Conf. on Computer Vision, 2021.
- Dynamic graph cnn for learning on point clouds. ACM Transactions ON Graphics, 2019.
- Attention-based point cloud edge sampling. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
- 3d shapenets: A deep representation for volumetric shapes. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2015.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In Proc. of European Conference on Computer Vision, 2020.
- Cape: Camera view position embedding for multi-view 3d object detection. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
- A scalable active framework for region annotation in 3d shape collections. ACM Transactions ON Graphics, 2016.
- 1% vs 100%: Parameter-efficient low rank adapter for dense predictions. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Annual Meeting of the Association for Computational Linguistics, 2022.
- Instance-aware dynamic prompt tuning for pre-trained point cloud models. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
- Towards compact 3d representations via point feature enhancement masked autoencoders. In Proc. of the AAAI Conf. on Artificial Intelligence, 2024.
- A simple vision transformer for weakly semi-supervised 3d object detection. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
- Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training. In Proc. of Advances in Neural Information Processing Systems, 2022.
- Centerformer: Center-based transformer for 3d object detection. In Proc. of European Conference on Computer Vision, 2022.
- Counter-interference adapter for multilingual machine translation. In Proc. of Conf. on Empirical Methods in Natural Language Processing, 2021.
- Xin Zhou (319 papers)
- Dingkang Liang (37 papers)
- Wei Xu (536 papers)
- Xingkui Zhu (5 papers)
- Yihan Xu (7 papers)
- Zhikang Zou (25 papers)
- Xiang Bai (222 papers)