Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis (2403.01439v2)

Published 3 Mar 2024 in cs.CV

Abstract: Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. However, existing methods for model adaptation usually update all model parameters, i.e., full fine-tuning paradigm, which is inefficient as it relies on high computational costs (e.g., training GPU memory) and massive storage space. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency. To achieve this goal, we freeze the parameters of the default pre-trained models and then propose the Dynamic Adapter, which generates a dynamic scale for each token, considering the token significance to the downstream task. We further seamlessly integrate Dynamic Adapter with Prompt Tuning (DAPT) by constructing Internal Prompts, capturing the instance-specific features for interaction. Extensive experiments conducted on five challenging datasets demonstrate that the proposed DAPT achieves superior performance compared to the full fine-tuning counterparts while significantly reducing the trainable parameters and training GPU memory by 95% and 35%, respectively. Code is available at https://github.com/LMD0311/DAPT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  2. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  3. Rethinking point cloud registration as masking and reconstruction. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023a.
  4. Adaptformer: Adapting vision transformers for scalable visual recognition. In Proc. of Advances in Neural Information Processing Systems, 2022.
  5. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  6. Voxelnext: Fully sparse voxelnet for 3d object detection and tracking. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023b.
  7. Pra-net: Point relation-aware network for 3d point cloud analysis. IEEE Transactions on Image Processing, 2021.
  8. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? In Proc. of Intl. Conf. on Learning Representations, 2022.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. of Intl. Conf. on Learning Representations, 2021.
  10. A point set generation network for 3d object reconstruction from a single image. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2017.
  11. Mvtn: Multi-view transformation network for 3d shape recognition. In Porc. of IEEE Intl. Conf. on Computer Vision, 2021.
  12. Towards a unified view of parameter-efficient transfer learning. In Proc. of Intl. Conf. on Learning Representations, 2021.
  13. Masked autoencoders are scalable vision learners. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  14. Attention discriminant sampling for point clouds. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
  15. Query-based temporal fusion with explicit motion for 3d object detection. In Proc. of Advances in Neural Information Processing Systems, 2024.
  16. Parameter-efficient transfer learning for nlp. In Proc. of Intl. Conf. on Machine Learning, 2019.
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  18. Visual prompt tuning. In Proc. of European Conference on Computer Vision, 2022.
  19. Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proc. of the AAAI Conf. on Artificial Intelligence, 2023.
  20. Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
  21. Compacter: Efficient low-rank hypercomplex adapter layers. In Proc. of Advances in Neural Information Processing Systems, 2021.
  22. Stratified transformer for 3d point cloud segmentation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  23. The power of scale for parameter-efficient prompt tuning. In Proc. of Conf. on Empirical Methods in Natural Language Processing, 2021.
  24. Pillarnext: Rethinking network designs for 3d object detection in lidar point clouds. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
  25. Prefix-tuning: Optimizing continuous prompts for generation. In Annual Meeting of the Association for Computational Linguistics, 2021.
  26. Scaling & shifting your features: A new baseline for efficient model tuning. In Proc. of Advances in Neural Information Processing Systems, 2022.
  27. Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024.
  28. Masked discrimination for self-supervised learning on point clouds. In Proc. of European Conference on Computer Vision, 2022a.
  29. Relation-shape convolutional neural network for point cloud analysis. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2019.
  30. Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
  31. Sgdr: Stochastic gradient descent with warm restarts. In Proc. of Intl. Conf. on Learning Representations, 2017.
  32. Decoupled weight decay regularization. In Proc. of Intl. Conf. on Learning Representations, 2019.
  33. Cheap and quick: Efficient vision-language instruction tuning for large language models. In Proc. of Advances in Neural Information Processing Systems, 2023.
  34. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In Proc. of Intl. Conf. on Learning Representations, 2022.
  35. Unipelt: A unified framework for parameter-efficient language model tuning. In Annual Meeting of the Association for Computational Linguistics, 2022.
  36. Autosdf: Shape priors for 3d completion, reconstruction and generation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  37. Masked autoencoders for point cloud self-supervised learning. In Proc. of European Conference on Computer Vision, 2022.
  38. Self-supervised learning of point clouds via orientation estimation. In Intl. Conf. on 3D Vision. IEEE, 2020.
  39. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2017a.
  40. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. of Advances in Neural Information Processing Systems, 2017b.
  41. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. In Proc. of Intl. Conf. on Machine Learning, 2023.
  42. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. In Proc. of Advances in Neural Information Processing Systems, 2022.
  43. Surface representation for point clouds. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  44. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Porc. of IEEE Intl. Conf. on Computer Vision, 2019.
  45. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.
  46. Unsupervised point cloud pre-training via occlusion completion. In Porc. of IEEE Intl. Conf. on Computer Vision, 2021.
  47. Dynamic graph cnn for learning on point clouds. ACM Transactions ON Graphics, 2019.
  48. Attention-based point cloud edge sampling. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
  49. 3d shapenets: A deep representation for volumetric shapes. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2015.
  50. Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In Proc. of European Conference on Computer Vision, 2020.
  51. Cape: Camera view position embedding for multi-view 3d object detection. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
  52. A scalable active framework for region annotation in 3d shape collections. ACM Transactions ON Graphics, 2016.
  53. 1% vs 100%: Parameter-efficient low rank adapter for dense predictions. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2023.
  54. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2022.
  55. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Annual Meeting of the Association for Computational Linguistics, 2022.
  56. Instance-aware dynamic prompt tuning for pre-trained point cloud models. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
  57. Towards compact 3d representations via point feature enhancement masked autoencoders. In Proc. of the AAAI Conf. on Artificial Intelligence, 2024.
  58. A simple vision transformer for weakly semi-supervised 3d object detection. In Porc. of IEEE Intl. Conf. on Computer Vision, 2023.
  59. Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training. In Proc. of Advances in Neural Information Processing Systems, 2022.
  60. Centerformer: Center-based transformer for 3d object detection. In Proc. of European Conference on Computer Vision, 2022.
  61. Counter-interference adapter for multilingual machine translation. In Proc. of Conf. on Empirical Methods in Natural Language Processing, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xin Zhou (319 papers)
  2. Dingkang Liang (37 papers)
  3. Wei Xu (536 papers)
  4. Xingkui Zhu (5 papers)
  5. Yihan Xu (7 papers)
  6. Zhikang Zou (25 papers)
  7. Xiang Bai (222 papers)
Citations (11)

Summary

Parameter-Efficient Transfer Learning for Point Cloud Analysis with Dynamic Adapter and Prompt Tuning

The paper "Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis" introduces a novel framework, DAPT, to address the inefficiencies of full fine-tuning methods in point cloud analysis. Point cloud data, prevalent in 3D vision tasks such as autonomous driving and 3D reconstruction, presents challenges due to its irregular and sparse nature. Traditional full fine-tuning approaches to adapt pre-trained point cloud models to downstream tasks demand significant computational resources and storage capacity, motivating the exploration of parameter-efficient transfer learning (PETL) methods.

The presented work aims to achieve a balance between task performance and parameter efficiency. Instead of updating all model parameters, DAPT retains most of the pre-trained model's parameters and introduces a new approach integrating a Dynamic Adapter with Prompt Tuning.

The Dynamic Adapter is innovative in that it adjusts the scale dynamically for each token. This method considers the significance of each token within the context of the downstream task, thus tailoring the scale during the inference phase instead of relying on a static, manually-set parameter. Empirically, this addresses complexities like varying geometric structures and non-uniform distributions within point clouds.

Moreover, DAPT incorporates an Internal Prompt that derives prompts from the dynamic outputs itself, ensuring that these are more relevant to the given task than traditionally externally initialized and static prompts. This integration allows the model to efficiently capture instance-specific features and enhance interactions within the point cloud analysis models.

DAPT shows superiority over full fine-tuning counterparts by reducing trainable parameters by 95% and saving up to 35% in GPU memory usage while maintaining or even improving performance. For instance, on challenging datasets like ScanObjectNN PB_50_RS, it achieved a 2.36% increase in accuracy using the Point-BERT baseline. The approach also proved effective in few-shot learning and part segmentation tasks, emphasizing its broad applicability to various 3D vision datasets and contexts.

In evaluating the broader implications, DAPT contributes to an evolving paradigm within AI, particularly in resource-conscious adaptation of large models. This work resonates in the ongoing narrative of efficient machine learning practices, where computational and storage resources are limited. It also paves the way for future explorations in integrating dynamic scaling and internal feature utilization within parameter-efficient frameworks, offering a path forward for adaptive and scalable AI systems.

Through rigorous experiments, this research emphasizes both practical and theoretical contributions towards efficient fine-tuning regimes. Future work may explore the application of these techniques in other domains of AI with similar data challenges, exploring extensions into more complex tasks such as 3D object detection, where the trade-offs between parameter efficiency and task performance are even more pronounced.