Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Low-rank Adaptation for Large Language Models (2308.13111v5)

Published 24 Aug 2023 in cs.LG

Abstract: Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of LLMs. However, fine-tuned LLMs often become overconfident especially when fine-tuned on small datasets. Bayesian methods, with their inherent ability to estimate uncertainty, serve as potent tools to mitigate overconfidence and enhance calibration. In this work, we introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the posterior over the LoRA parameters, considerably improving the calibration of fine-tuned LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Deep kernel processes. In International Conference on Machine Learning, pp. 130–140. PMLR, 2021.
  2. Adapting the linearised laplace model evidence for modern deep learning. In ICML, 2022.
  3. Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470, 2020.
  4. Weight uncertainty in neural network. In International conference on machine learning, pp. 1613–1622. PMLR, 2015.
  5. Checkpoint ensembles: Ensemble methods from a single training process. arXiv preprint arXiv:1710.03282, 2017.
  6. Calibrating transformers via sparse gaussian processes. arXiv preprint arXiv:2303.02444, 2023.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  8. Pathologies in priors and inference for bayesian transformers. arXiv preprint arXiv:2110.04020, 2021.
  9. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL, 2019.
  10. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1, 2018.
  11. Laplace redux-effortless bayesian deep learning. NeurIPS, 2021a.
  12. Bayesian deep learning via subnetwork inference. In ICML, 2021b.
  13. Accelerated linearized laplace approximation for bayesian deep learning. NeurIPS, 2022.
  14. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  15. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904, 2022.
  16. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
  17. Efficient and scalable bayesian neural nets with rank-1 factors. In International conference on machine learning, pp. 2782–2792. PMLR, 2020.
  18. Bayesian attention modules. Advances in Neural Information Processing Systems, 33:16362–16376, 2020.
  19. ’in-between’uncertainty in bayesian neural networks. In ICML Workshop on Uncertainty and Robustness in Deep Learning, 2019.
  20. Bayesian neural network priors revisited. arXiv preprint arXiv:2102.06571, 2021.
  21. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
  22. On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
  23. Preserving pre-trained features helps calibrate fine-tuned language models. In ICLR, 2023.
  24. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  25. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  26. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  27. Improving predictions of bayesian neural nets via local linearization. In AISTAT, 2021.
  28. What are bayesian neural network posteriors really like? In International conference on machine learning, pp. 4629–4640. PMLR, 2021.
  29. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  30. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021.
  31. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
  32. Being Bayesian, even just a bit, fixes overconfidence in relu networks. In ICML, 2020.
  33. Limitations of the empirical fisher approximation for natural gradient descent. Advances in neural information processing systems, 32, 2019.
  34. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
  35. Passive learning of active causal strategies in agents and language models, May 2023.
  36. Mixout: Effective regularization to finetune large-scale pretrained language models. arXiv preprint arXiv:1909.11299, 2019.
  37. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
  38. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  39. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  40. Uncertainty estimation with infinitesimal jackknife, its distribution and mean-field approximation. ArXiv, abs/2006.07584, 2020.
  41. David J. C. MacKay. Choice of basis for laplace approximation. Machine Learning, 33(1):77–86, 1998.
  42. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 1992.
  43. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  44. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018.
  45. Global inducing point variational posteriors for bayesian neural networks and deep gaussian processes. In International Conference on Machine Learning, pp. 8248–8259. PMLR, 2021.
  46. OpenAI. GPT-4 technical report, 2023.
  47. Asdl: A unified interface for gradient preconditioning in pytorch. arXiv preprint arXiv:2305.04684, 2023.
  48. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  49. On the calibration of pre-trained language models using mixup guided by area under the margin and saliency. arXiv preprint arXiv:2203.07559, 2022.
  50. A scalable laplace approximation for neural networks. In ICLR, 2018.
  51. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 2021.
  52. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning. arXiv preprint arXiv:2309.05173, 2023.
  53. A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731, 2019.
  54. Large Language Models Encode Clinical Knowledge. arXiv, 2022.
  55. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. arXiv preprint arXiv:2305.14975, 2023.
  56. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  57. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  58. Bayesian layers: A module for neural network uncertainty. Advances in neural information processing systems, 32, 2019.
  59. Superglue: A stickier benchmark for general-purpose language understanding systems. In NeurIPS, 2019a.
  60. Glue: A multi-task benchmark and analysis platform for natural language understanding. In ICLR, 2019b.
  61. Lora ensembles for large language model fine-tuning. arXiv preprint arXiv:2310.00035, 2023.
  62. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  63. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  64. Transformers: State-of-the-art natural language processing. In EMNLP, 2020.
  65. BloombergGPT: A Large Language Model for Finance, May 2023.
  66. Uncertainty quantification with pre-trained language models: A large-scale empirical analysis. In EMNLP, 2022.
  67. Bayesian transformer language models for speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  7378–7382. IEEE, 2021.
  68. Uncertainty-penalized reinforcement learning from human feedback with diverse reward lora ensembles. arXiv preprint arXiv:2401.00243, 2023.
  69. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  70. Cyclical stochastic gradient mcmc for bayesian deep learning. arXiv preprint arXiv:1902.03932, 2019.
  71. Bayesian attention belief networks. In ICML, 2021.
  72. On the role of dataset quality and heterogeneity in model confidence. arXiv preprint arXiv:2002.09831, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Adam X. Yang (6 papers)
  2. Maxime Robeyns (6 papers)
  3. Xi Wang (275 papers)
  4. Laurence Aitchison (66 papers)
Citations (30)