Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey (2403.14608v7)

Published 21 Mar 2024 in cs.LG
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Abstract: Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale LLMs with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......

The paper "Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey" (Han et al., 21 Mar 2024 ) provides an extensive overview of Parameter-Efficient Fine-Tuning (PEFT) methods, which have emerged as a crucial technique for adapting large pre-trained models to downstream tasks while mitigating the significant computational costs associated with full model fine-tuning.

The core problem PEFT addresses is the immense scale of large models, often containing billions of parameters. Fine-tuning these models by updating all parameters requires vast computational resources, including high-end hardware, significant memory, and extensive training time, making it impractical for many researchers and practitioners. PEFT offers a practical solution by modifying only a small fraction of the model's parameters or introducing a minimal number of new trainable parameters. This approach substantially reduces computational overhead, memory requirements, and storage for task-specific models.

The survey categorizes PEFT algorithms into four main types based on their operational mechanisms:

  1. Additive PEFT: These methods introduce new, trainable parameters while keeping the original pre-trained model frozen. The additional parameters are strategically placed within the model architecture.
    • Adapters: Small bottleneck feed-forward networks inserted within Transformer blocks (e.g., after attention or FFN layers). Only the adapter weights are trained. Variations include Serial Adapters, Parallel Adapters, and methods like AdapterFusion and AdaMix for multi-task adaptation.
    • Soft Prompts: Learnable vectors prepended to the input embeddings or inserted into intermediate layers of the Transformer. Prompt tuning and Prefix-tuning are examples, where continuous prompt embeddings are optimized. Only the prompt parameters are trained.
    • Others: Techniques like (IA)3\text{(IA)}^3 and SSF introduce learnable scaling and shifting vectors applied to activations. These can often be merged into the original model weights after training, incurring no inference overhead.
  2. Selective PEFT: This category involves fine-tuning only a subset of the existing parameters of the pre-trained model, keeping the majority frozen.
    • Unstructured Masking: A binary mask is applied to individual parameters, selecting which ones to train. Methods like Diff pruning, PaFi, and FishMask use different criteria (magnitude, Fisher information) to determine the mask.
    • Structural Masking: Parameter selection is organized in regular patterns (e.g., per layer, per module, or based on weight groups) to enhance hardware efficiency. Examples include structured variants of Diff pruning and methods like Bitfit (tuning only biases) and Xattn Tuning (tuning only cross-attention layers).
  3. Reparameterized PEFT: This approach constructs a low-dimensional reparameterization of the original model parameters during training, which can then be equivalently transformed back into the original parameter space for inference, ideally without added latency.
    • Low-rank Decomposition: The most prominent example is LoRA (Low-Rank Adaptation), which represents the update to a weight matrix (ΔW\Delta W) as the product of two low-rank matrices (WupWdownW_{\text{up}}W_{\text{down}}). Only WupW_{\text{up}} and WdownW_{\text{down}} are trained. After training, ΔW\Delta W is added to the original weight matrix.
    • LoRA Derivatives: Methods like DyLoRA and AdaLoRA dynamically adjust the rank during training. Others like Laplace-LoRA and LoRA+ propose training improvements (Bayesian approach, differential learning rates). Multiple LoRA methods like LoRAHub and MOELoRA compose or select from multiple LoRA modules for different tasks or instances. Other reparameterized methods include Compacter and KronA (using Kronecker products), HiWi (applying adapter to weights), VeRA (using shared frozen matrices and trainable vectors), and DoRA (decomposing weights into magnitude and direction).
  4. Hybrid PEFT: These methods combine techniques from multiple categories to leverage their respective advantages. Examples include UniPELT (integrating LoRA, prefix-tuning, and adapters with a gating mechanism) and methods that use Neural Architecture Search (NAS) like NOAH and AUTOPEFT to find optimal combinations of PEFT techniques for specific tasks.

Beyond algorithmic design, the survey discusses practical considerations for efficient PEFT implementation, particularly focusing on memory and computational bottlenecks. KV-cache management is crucial for efficient autoregressive inference in LLMs. The paper also highlights the application of model compression techniques on PEFT methods or in conjunction with PEFT:

  • PEFT Pruning: Techniques like AdapterDrop and SparseAdapter prune the parameters of the PEFT modules themselves (e.g., removing adapters or sparsifying LoRA weights) to reduce computational and memory overhead.
  • PEFT Quantization: Quantizing the weights of PEFT modules (e.g., BI-Adapter, PEQA) or the base model weights during PEFT training (e.g., QLoRA, LoftQ, LQ-LoRA, QA-LoRA, BitDelta) significantly reduces memory footprint and can accelerate computation on hardware supporting low-precision arithmetic.
  • Memory-efficient PEFT: Methods like Side-Tuning, LST, Res-Tuning, and MEFT focus on reducing the memory required during training, often by minimizing or eliminating the need to store gradients for the entire pre-trained backbone model. Techniques like LoRA-FA optimize memory by freezing certain low-rank components. Other methods like HyperTuning and MeZO explore training without full backpropagation on the large model.

The survey also demonstrates the versatility of PEFT by reviewing its applications beyond traditional NLP tasks and models:

  • PEFT for LLMs -- Beyond the Basics: Applications include visual instruction following (e.g., VL-Adapter, LLaMA-Adapter), continual learning (e.g., AdapterCL, CPT, O-LoRA), and context window extension (e.g., LongLoRA, LongQLoRA, LLoCO) where PEFT helps adapt models to new data, tasks, or longer sequences efficiently.
  • PEFT for ViTs: Applying PEFT methods like Adapters (e.g., AdaptFormer, ST-Adapter, AIM) and Visual Prompt Tuning (VPT) to Vision Transformers for tasks like image classification and video recognition.
  • PEFT for VLAs: Using PEFT (e.g., prompt tuning like CoOp, CoCoOp, MaPLe, TPT; or adapters like CLIP-Adapter, Tip-Adapter) to adapt Vision-Language Alignment models like CLIP for tasks such as open-vocabulary image classification.
  • PEFT for Diffusion Models: Employing PEFT techniques (e.g., gated layers like GLIGEN, fine-tuning copies like ControlNet, LoRA like Concept Sliders, adapters like T2I-Adapter, or soft prompts like Textual Inversion) to adapt pre-trained diffusion models for additional input control or customized content generation.

Finally, the paper addresses system design challenges for deploying and training PEFT models in real-world scenarios.

  • Centralized PEFT Query Serving: Cloud providers serving multiple users require efficient systems to handle diverse PEFT requests on a single large model instance. This involves challenges in batching and scheduling. PetS [pets] is presented as a case paper for managing multi-PEFT inference requests through coordinated batching and macro-batch streaming, leveraging kernel fusion for efficiency.
  • Distributed System for PEFT: Training on sensitive user data necessitates distributed approaches where the base model stays on the server and PEFT modules are trained on user devices. DLoRA [gao2024dlora] is an example of a distributed PEFT framework.
  • Multi-PEFT Training: Simultaneously training multiple PEFT instances (possibly for different users or tasks) on shared infrastructure requires efficient memory management, gradient handling, and batching strategies. Frameworks like S-LoRA [slora] and Punica [punica] tackle these challenges by optimizing kernel design and multi-tenant serving for LoRA modules. Offsite-Tuning [offsite-tuning] is another case paper focusing on privacy-preserving distributed tuning using a learnable adapter and a compressed emulator.

The survey concludes by highlighting future research directions, including the simplification of hyperparameter tuning for PEFT methods, the establishment of unified benchmarks for fair comparison, further enhancing training efficiency through better memory management and compression integration, exploring scaling laws for PEFT with increasingly large models, expanding PEFT applications to new model architectures and tasks, enhancing data privacy in PEFT systems, and investigating the interplay between model compression techniques and PEFT performance on hardware.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (243)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255 (2020).
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems 35 (2022), 23716–23736.
  3. Composable sparse fine-tuning for cross-lingual transfer. arXiv preprint arXiv:2110.07560 (2021).
  4. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (2015), pp. 2425–2433.
  5. Adapting the linearised laplace model evidence for modern deep learning. In International Conference on Machine Learning (2022), PMLR, pp. 796–821.
  6. Sequential modeling enables scalable learning for large vision models. arXiv preprint arXiv:2312.00785 (2023).
  7. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183–202.
  8. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence (2020).
  9. Video generation models as world simulators.
  10. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  11. Speedupnet: A plug-and-play hyper-network for accelerating text-to-image diffusion models. arXiv preprint arXiv:2312.08887 (2023).
  12. Int2. 1: Towards fine-tunable quantized large language models with error correction through low-rank adaptation. arXiv preprint arXiv:2306.08162 (2023).
  13. Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Transactions on image processing 7, 3 (1998), 319–335.
  14. A survey of web information extraction systems. IEEE transactions on knowledge and data engineering 18, 10 (2006), 1411–1428.
  15. Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization. arXiv preprint arXiv:2310.15080 (2023).
  16. Parameter-efficient fine-tuning design spaces. arXiv preprint arXiv:2301.01821 (2023).
  17. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
  18. Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer. arXiv preprint arXiv:2305.02423 (2023).
  19. Punica: Multi-tenant lora serving. arXiv preprint arXiv:2310.18547 (2023).
  20. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 12270–12280.
  21. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems 35 (2022), 16664–16678.
  22. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595 (2023).
  23. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 9640–9649.
  24. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307 (2023).
  25. Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022).
  26. Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning (2021), PMLR, pp. 1931–1942.
  27. Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 14306–14316.
  28. Codeprompt: Task-agnostic prefix tuning for program and language generation. In Findings of the Association for Computational Linguistics: ACL 2023 (2023), pp. 5282–5297.
  29. Adaptersoup: Weight averaging to improve generalization of pretrained language models. arXiv preprint arXiv:2302.07027 (2023).
  30. Clark, C. e. a. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL (2019).
  31. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1 (2018).
  32. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  33. Lifelong learning for question answering with hierarchical prompts. arXiv preprint arXiv:2208.14602 (2022).
  34. Unified low-resource sequence labeling by sample-aware dynamic sparse finetuning. arXiv preprint arXiv:2311.03748 (2023).
  35. Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning (2023), PMLR, pp. 7480–7512.
  36. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems 35 (2022), 30318–30332.
  37. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023).
  38. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  39. Sparse low-rank adaptation of pre-trained language models. arXiv preprint arXiv:2311.11696 (2023).
  40. An image is worth 16x16 words: Transformers for image recognition at scale. arxiv 2020. arXiv preprint arXiv:2010.11929 (2010).
  41. Teaching structured vision & language concepts to vision & language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 2657–2668.
  42. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022).
  43. The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2010), 303–338.
  44. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 2704–2714.
  45. Mixture-of-loras: An efficient multitask tuning for large language models. arXiv preprint arXiv:2403.03432 (2024).
  46. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
  47. Frazier, P. I. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811 (2018).
  48. On the effectiveness of parameter-efficient fine-tuning. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 12799–12807.
  49. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
  50. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092 (2023).
  51. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (2023), 1–15.
  52. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010 (2023).
  53. A unified continual learning framework with general parameter-efficient tuning. arXiv preprint arXiv:2303.10070 (2023).
  54. Cross-attention is all you need: Adapting pretrained transformers for machine translation. arXiv preprint arXiv:2104.08771 (2021).
  55. The reversible residual network: Backpropagation without storing activations. Advances in neural information processing systems 30 (2017).
  56. The” something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (2017), pp. 5842–5850.
  57. Unbounded cache model for online language modeling with open vocabulary. Advances in neural information processing systems 30 (2017).
  58. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
  59. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463 (2020).
  60. Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning. arXiv preprint arXiv:2311.12023 (2023).
  61. A survey on large language models: Applications, challenges, limitations, and practical usage. TechRxiv (2023).
  62. Contrastive diffusion model with auxiliary guidance for coarse-to-fine pet reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention (2023), Springer, pp. 239–249.
  63. Zero-shot referring expression comprehension via structural similarity between images and captions. arXiv preprint arXiv:2311.17048 (2023).
  64. Increasing model capacity for free: A simple strategy for parameter efficient fine-tuning. In The Twelfth International Conference on Learning Representations (2023).
  65. Lora+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354 (2024).
  66. Sensitivity-aware visual parameter-efficient fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 11825–11835.
  67. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021).
  68. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), pp. 16000–16009.
  69. SparseAdapter: An easy approach for improving the parameter-efficiency of adapters. In Findings of the Association for Computational Linguistics: EMNLP 2022 (Abu Dhabi, United Arab Emirates, Dec. 2022), Association for Computational Linguistics, pp. 2184–2190.
  70. Mera: Merging pretrained adapters for few-shot learning. arXiv preprint arXiv:2308.15982 (2023).
  71. Parameter-efficient model adaptation for vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 817–825.
  72. Structured pruning adapters. arXiv preprint arXiv:2211.10155 (2022).
  73. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  74. A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR) 51, 6 (2019), 1–36.
  75. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning (2019), PMLR, pp. 2790–2799.
  76. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  77. Sparse structure search for parameter-efficient tuning. arXiv preprint arXiv:2206.07382 (2022).
  78. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933 (2023).
  79. Vl-pet: Vision-and-language parameter-efficient tuning via granularity control. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 3010–3020.
  80. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269 (2023).
  81. Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 22157–22167.
  82. Mvp-tuning: Multi-view knowledge retrieval with prompt tuning for commonsense reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 13417–13432.
  83. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning (2021), PMLR, pp. 4904–4916.
  84. Visual prompt tuning. In European Conference on Computer Vision (2022), Springer, pp. 709–727.
  85. Res-tuning: A flexible and efficient tuning paradigm via unbinding tuner from backbone. arXiv preprint arXiv:2310.19859 (2023).
  86. Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039 (2022).
  87. Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 17217–17226.
  88. Parameter-efficient tuning for large language model without calculating its gradients. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 321–330.
  89. Prompting visual-language models for efficient video understanding. In European Conference on Computer Vision (2022), Springer, pp. 105–124.
  90. Gear: An efficient kv cache compression recipefor near-lossless generative inference of llm. arXiv preprint arXiv:2403.05527 (2024).
  91. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems 34 (2021), 1022–1035.
  92. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
  93. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 19113–19122.
  94. Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization. arXiv preprint arXiv:2305.14152 (2023).
  95. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
  96. Hmdb: a large video database for human motion recognition. In 2011 International conference on computer vision (2011), IEEE, pp. 2556–2563.
  97. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 1931–1941.
  98. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles (2023), pp. 611–626.
  99. Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models. arXiv preprint arXiv:2305.16597 (2023).
  100. Lee, S. Toward continual learning for conversational agents. arXiv preprint arXiv:1712.09943 (2017).
  101. Conditional adapters: Parameter-efficient transfer learning with fast inference. arXiv preprint arXiv:2304.04947 (2023).
  102. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
  103. Camel: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems (2023).
  104. Prefix propagation: Parameter-efficient tuning for long sequences. arXiv preprint arXiv:2305.12086 (2023).
  105. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
  106. Prompt tuning pushes farther, contrastive learning pulls closer: A two-stage approach to mitigate social biases. arXiv preprint arXiv:2307.01595 (2023).
  107. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. arXiv preprint arXiv:2110.05208 (2021).
  108. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659 (2023).
  109. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems 35 (2022), 109–123.
  110. Prompts can play lottery tickets well: Achieving lifelong information extraction via lottery prompt tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 277–292.
  111. Parameter-efficient fine-tuning without introducing new latency. arXiv preprint arXiv:2305.16742 (2023).
  112. Make your pre-trained model reversible: From parameter to memory efficient fine-tuning. arXiv preprint arXiv:2306.00477 (2023).
  113. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (2014), Springer, pp. 740–755.
  114. Frozen clip models are efficient video learners. In European Conference on Computer Vision (2022), Springer, pp. 388–404.
  115. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  116. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
  117. Versatile black-box optimization. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (2020), pp. 620–628.
  118. Bitdelta: Your fine-tune may only be worth one bit. arXiv preprint arXiv:2402.10193 (2024).
  119. Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications. arXiv preprint arXiv:2310.18339 (2023).
  120. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353 (2024).
  121. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602 (2021).
  122. Late prompt tuning: A late prompt could be better than many prompts. arXiv preprint arXiv:2210.11292 (2022).
  123. Gpt understands, too. arXiv preprint arXiv:2103.10385 (2021).
  124. Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning. arXiv preprint arXiv:2305.15065 (2023).
  125. Lcm-lora: A universal stable-diffusion acceleration module. arXiv preprint arXiv:2311.05556 (2023).
  126. Xprompt: Exploring the extreme of prompt tuning. arXiv preprint arXiv:2210.04457 (2022).
  127. MacKay, D. J. A practical bayesian framework for backpropagation networks. Neural computation 4, 3 (1992), 448–472.
  128. Continual learning in task-oriented dialogue systems. arXiv preprint arXiv:2012.15504 (2020).
  129. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. arXiv preprint arXiv:2106.04489 (2021).
  130. Proving the lottery ticket hypothesis: Pruning is all you need. In International Conference on Machine Learning (2020), PMLR, pp. 6682–6691.
  131. Fine-tuning language models with just forward passes. arXiv preprint arXiv:2305.17333 (2023).
  132. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  133. Unipelt: A unified framework for parameter-efficient language model tuning. arXiv preprint arXiv:2110.07577 (2021).
  134. Periodiclora: Breaking the low-rank bottleneck in lora optimization. arXiv preprint arXiv:2402.16141 (2024).
  135. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP (2018).
  136. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
  137. Zero-shot temporal action detection via vision-language prompting. In European Conference on Computer Vision (2022), Springer, pp. 681–697.
  138. Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision (2022), Springer, pp. 1–18.
  139. OpenAI. Gpt-4. In https://openai.com/gpt-4 (2023).
  140. Orhan, E. A simple cache model for image recognition. Advances in Neural Information Processing Systems 31 (2018).
  141. On prefix-tuning for lightweight out-of-distribution detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 1533–1545.
  142. Controlling the extraction of memorized data from large language models via prompt-tuning. arXiv preprint arXiv:2305.11759 (2023).
  143. St-adapter: Parameter-efficient image-to-video transfer learning. Advances in Neural Information Processing Systems 35 (2022), 26462–26477.
  144. When do prompting and prefix-tuning work? a theory of capabilities and limitations. arXiv preprint arXiv:2310.19698 (2023).
  145. Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247 (2020).
  146. Hypertuning: Toward adapting large language models without back-propagation. In International Conference on Machine Learning (2023), PMLR, pp. 27854–27875.
  147. Adapters: A unified library for parameter-efficient and modular transfer learning, 2023.
  148. Exploring universal intrinsic task subspace via prompt tuning. arXiv preprint arXiv:2110.07867 (2021).
  149. Learning transferable visual models from natural language supervision. In International conference on machine learning (2021), PMLR, pp. 8748–8763.
  150. Qdylora: Quantized dynamic low-rank adaptation for efficient large language model tuning. arXiv preprint arXiv:2402.10462 (2024).
  151. Learning semantic proxies from visual prompts for parameter-efficient fine-tuning in deep metric learning. arXiv preprint arXiv:2402.02340 (2024).
  152. Self-critical sequence training for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7008–7024.
  153. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), pp. 10684–10695.
  154. Adapterdrop: On the efficiency of adapters in transformers. arXiv preprint arXiv:2010.11918 (2020).
  155. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 22500–22510.
  156. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM 64, 9 (2021), 99–106.
  157. Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728 (2019).
  158. S-lora: Serving thousands of concurrent lora adapters. arXiv preprint arXiv:2311.03285 (2023).
  159. Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning (2023), PMLR, pp. 31094–31116.
  160. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning. arXiv preprint arXiv:2309.05173 (2023).
  161. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems 35 (2022), 14274–14289.
  162. Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 15638–15650.
  163. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (2015), PMLR, pp. 2256–2265.
  164. How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270 (2021).
  165. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021).
  166. On transferability of prompt tuning for natural language processing. arXiv preprint arXiv:2111.06719 (2021).
  167. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems 35 (2022), 12991–13005.
  168. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 5227–5237.
  169. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems 34 (2021), 24193–24205.
  170. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
  171. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558 (2022).
  172. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE transactions on pattern analysis and machine intelligence 39, 4 (2016), 652–663.
  173. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021).
  174. Efficient fine-tuning of bert models on the edge. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS) (2022), IEEE, pp. 1838–1842.
  175. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).
  176. Fvqa: Fact-based visual question answering. IEEE transactions on pattern analysis and machine intelligence 40, 10 (2017), 2413–2427.
  177. Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 9147–9160.
  178. Orthogonal subspace learning for language model continual learning. arXiv preprint arXiv:2310.14152 (2023).
  179. Universality and limitations of prompt tuning. arXiv preprint arXiv:2305.18787 (2023).
  180. Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. arXiv preprint arXiv:2205.12410 1, 2 (2022), 4.
  181. P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. Advances in neural information processing systems 35 (2022), 14388–14402.
  182. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 139–149.
  183. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Advances in neural information processing systems 22 (2009).
  184. Adversarial soft prompt tuning for cross-domain sentiment analysis. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 2438–2447.
  185. Infoprompt: Information-theoretic soft prompt tuning for natural language understanding. arXiv preprint arXiv:2306.04933 (2023).
  186. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 7623–7633.
  187. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
  188. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding 163 (2017), 21–40.
  189. Idpg: An instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497 (2022).
  190. Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870 (2023).
  191. Simda: Simple diffusion adapter for efficient video generation. arXiv preprint arXiv:2308.09710 (2023).
  192. Gentopia: A collaborative platform for tool-augmented llms. arXiv preprint arXiv:2308.04030 (2023).
  193. Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 2945–2954.
  194. Raise a child in large language model: Towards effective and generalizable fine-tuning. arXiv preprint arXiv:2109.05687 (2021).
  195. Qa-lora: Quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717 (2023).
  196. Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 17503–17512.
  197. Bayesian low-rank adaptation for large language models. arXiv preprint arXiv:2308.13111 (2023).
  198. Yang, J. Longqlora: Efficient and effective method to extend context length of large language models. arXiv preprint arXiv:2311.04879 (2023).
  199. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56, 4 (2023), 1–39.
  200. Aim: Adapting image models for efficient video action recognition. arXiv preprint arXiv:2302.03024 (2023).
  201. End-to-end open-domain question answering with bertserini. arXiv preprint arXiv:1902.01718 (2019).
  202. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023).
  203. Improving visual prompt tuning for self-supervised vision transformers. arXiv preprint arXiv:2306.05067 (2023).
  204. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 4651–4659.
  205. Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip. arXiv preprint arXiv:2308.02487 (2023).
  206. Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning. arXiv preprint arXiv:2309.05444 (2023).
  207. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021).
  208. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019).
  209. Ipdreamer: Appearance-controllable 3d object generation with image prompts. arXiv preprint arXiv:2310.05375 (2023).
  210. One network, many masks: Towards more parameter-efficient transfer learning. arXiv preprint arXiv:2305.17682 (2023).
  211. Root mean square layer normalization. Advances in Neural Information Processing Systems 32 (2019).
  212. Summit: Iterative text summarization via chatgpt. arXiv preprint arXiv:2305.14835 (2023).
  213. Side-tuning: a baseline for network adaptation via additive side networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16 (2020), Springer, pp. 698–714.
  214. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 3836–3847.
  215. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv preprint arXiv:2308.03303 (2023).
  216. Pruning meets low-rank parameter-efficient fine-tuning. arXiv preprint arXiv:2305.18403 (2023).
  217. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023).
  218. Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930 (2021).
  219. Pointclip: Point cloud understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 8552–8562.
  220. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023).
  221. Autolora: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. arXiv preprint arXiv:2403.09113 (2024).
  222. Neural prompt search, 2022.
  223. H2o: Heavy-hitter oracle for efficient generative inference of large language models. Advances in Neural Information Processing Systems 36 (2024).
  224. Towards adaptive prefix tuning for parameter-efficient language model fine-tuning. arXiv preprint arXiv:2305.15212 (2023).
  225. Tuning layernorm in attention: Towards efficient multi-modal llm finetuning. arXiv preprint arXiv:2312.11420 (2023).
  226. Prototype-based hyperadapter for sample-efficient multi-task tuning. arXiv preprint arXiv:2310.11670 (2023).
  227. Infusing hierarchical guidance into prompt tuning: A parameter-efficient framework for multi-level implicit discourse relation recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 6477–6492.
  228. Galore: Memory-efficient llm training by gradient low-rank projection. arXiv preprint arXiv:2403.03507 (2024).
  229. Knowledgeable parameter efficient tuning network for commonsense question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023), pp. 9051–9063.
  230. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 633–641.
  231. Autopeft: Automatic configuration search for parameter-efficient fine-tuning. arXiv preprint arXiv:2301.12132 (2023).
  232. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 16816–16825.
  233. Learning to prompt for vision-language models. International Journal of Computer Vision 130, 9 (2022), 2337–2348.
  234. Godec: Randomized low-rank & sparse matrix decomposition in noisy case. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (2011).
  235. {{\{{PetS}}\}}: A unified framework for {{\{{Parameter-Efficient}}\}} transformers serving. In 2022 USENIX Annual Technical Conference (USENIX ATC 22) (2022), pp. 489–504.
  236. Prompt-aligned gradient for prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 15659–15669.
  237. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
  238. Continual prompt tuning for dialog state tracking. arXiv preprint arXiv:2203.06654 (2022).
  239. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675 (2023).
  240. Spt: Learning to selectively insert prompts for better prompt tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023), pp. 11862–11878.
  241. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 2639–2650.
  242. Counter-interference adapter for multilingual machine translation. arXiv preprint arXiv:2104.08154 (2021).
  243. Toolqa: A dataset for llm question answering with external tools. arXiv preprint arXiv:2306.13304 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zeyu Han (17 papers)
  2. Chao Gao (122 papers)
  3. Jinyang Liu (51 papers)
  4. Sai Qian Zhang (33 papers)
  5. Jeff Zhang (15 papers)
Citations (156)