Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Published 22 Dec 2023 in cs.LG and cs.CY | (2312.14751v1)

Abstract: Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Self-Consuming Generative Models Go MAD, July 2023. URL http://arxiv.org/abs/2307.01850. arXiv:2307.01850 [cs].
  2. Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?, March 2023. URL http://arxiv.org/abs/2303.09377. arXiv:2303.09377 [cs].
  3. Frontier AI Regulation: Managing Emerging Risks to Public Safety, September 2023. URL http://arxiv.org/abs/2307.03718. arXiv:2307.03718 [cs].
  4. Ross Anderson. Open and Closed Systems are Equivalent (that is, in an ideal world). In Perspectives on free and open source software, pages 127–142. MIT Press, January 2007. URL https://www.research.ed.ac.uk/en/publications/open-and-closed-systems-are-equivalent-that-is-in-an-ideal-world.
  5. Composable Sparse Fine-Tuning for Cross-Lingual Transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1796, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.125. URL https://aclanthology.org/2022.acl-long.125.
  6. Synthetic Data from Diffusion Models Improves ImageNet Classification, April 2023. URL http://arxiv.org/abs/2304.08466. arXiv:2304.08466 [cs].
  7. Constitutional AI: Harmlessness from AI Feedback, December 2022. URL http://arxiv.org/abs/2212.08073. arXiv:2212.08073 [cs].
  8. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, September 2022. URL http://arxiv.org/abs/2106.10199. arXiv:2106.10199 [cs].
  9. The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A", September 2023. URL http://arxiv.org/abs/2309.12288. arXiv:2309.12288 [cs].
  10. Emergent autonomous scientific research capabilities of large language models, April 2023. URL http://arxiv.org/abs/2304.05332. arXiv:2304.05332 [physics].
  11. Petals: Collaborative Inference and Fine-tuning of Large Models, March 2023. URL http://arxiv.org/abs/2209.01188. arXiv:2209.01188 [cs].
  12. Machine Unlearning, December 2020. URL http://arxiv.org/abs/1912.03817. arXiv:1912.03817 [cs].
  13. ChemCrow: Augmenting large-language models with chemistry tools, June 2023. URL http://arxiv.org/abs/2304.05376. arXiv:2304.05376 [physics, stat].
  14. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims, April 2020. URL http://arxiv.org/abs/2004.07213. arXiv:2004.07213 [cs].
  15. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, March 2023. URL https://vicuna.lmsys.org/.
  16. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html.
  17. Neal A Clinehens. Aum Shinrikyo and weapons of mass destruction: A case study. Unpublished Manuscript, 2000.
  18. AI capabilities can be significantly improved without expensive retraining (forthcoming).
  19. Tim Dettmers. 8-Bit Approximations for Parallelism in Deep Learning, February 2016. URL http://arxiv.org/abs/1511.04561. arXiv:1511.04561 [cs].
  20. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, November 2022. URL http://arxiv.org/abs/2208.07339. arXiv:2208.07339 [cs].
  21. QLoRA: Efficient Finetuning of Quantized LLMs, May 2023. URL http://arxiv.org/abs/2305.14314. arXiv:2305.14314 [cs].
  22. The export of cryptography in the 20th and the 21st centuries. In Karl De Leeuw and Jan Bergstra, editors, The History of Information Security, pages 725–736. Elsevier Science B.V., Amsterdam, January 2007. ISBN 978-0-444-51608-4. doi: 10.1016/B978-044451608-4/50027-4. URL https://www.sciencedirect.com/science/article/pii/B9780444516084500274.
  23. AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback, May 2023. URL https://tatsu-lab.github.io/alpaca_farm_paper.pdf.
  24. Algorithmic progress in computer vision, August 2023. URL http://arxiv.org/abs/2212.05153. arXiv:2212.05153 [cs].
  25. BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B, October 2023. URL http://arxiv.org/abs/2311.00117. arXiv:2311.00117 [cs].
  26. How does the offense-defense balance scale? Journal of Strategic Studies, 42(6):736–763, September 2019. ISSN 0140-2390. doi: 10.1080/01402390.2019.1631810. URL https://doi.org/10.1080/01402390.2019.1631810. Publisher: Routledge _eprint: https://doi.org/10.1080/01402390.2019.1631810.
  27. Will releasing the weights of future large language models grant widespread access to pandemic agents?, November 2023. URL http://arxiv.org/abs/2310.18233. arXiv:2310.18233 [cs].
  28. The False Promise of Imitating Proprietary LLMs, May 2023. URL http://arxiv.org/abs/2305.15717. arXiv:2305.15717 [cs].
  29. Parameter-Efficient Transfer Learning with Diff Pruning, June 2021. URL http://arxiv.org/abs/2012.07463. arXiv:2012.07463 [cs].
  30. Scaling Expert Language Models with Unsupervised Domain Discovery, March 2023. URL http://arxiv.org/abs/2303.14177. arXiv:2303.14177 [cs].
  31. Julian Hazell. Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns, May 2023. URL http://arxiv.org/abs/2305.06972. arXiv:2305.06972 [cs].
  32. Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models, August 2023. URL http://arxiv.org/abs/2211.14946. arXiv:2211.14946 [cs].
  33. X-Risk Analysis for AI Research, September 2022. URL http://arxiv.org/abs/2206.05862. arXiv:2206.05862 [cs].
  34. Parameter-Efficient Transfer Learning for NLP, June 2019. URL http://arxiv.org/abs/1902.00751. arXiv:1902.00751 [cs, stat].
  35. Jeremy Howard. AI Safety and the Age of Dislightenment, July 2023. URL https://www.fast.ai/posts/2023-11-07-dislightenment.html.
  36. LoRA: Low-Rank Adaptation of Large Language Models, October 2021. URL http://arxiv.org/abs/2106.09685. arXiv:2106.09685 [cs].
  37. Large Language Models Can Self-Improve, October 2022. URL http://arxiv.org/abs/2210.11610. arXiv:2210.11610 [cs].
  38. Exploring the Benefits of Training Expert Language Models over Instruction Tuning, February 2023. URL http://arxiv.org/abs/2302.03202. arXiv:2302.03202 [cs].
  39. Licensing is neither feasible nor effective for addressing AI risks, October 2023a. URL https://www.aisnakeoil.com/p/licensing-is-neither-feasible-nor.
  40. Three Ideas for Regulating Generative AI, June 2023b. URL https://www.aisnakeoil.com/p/three-ideas-for-regulating-generative.
  41. Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges, June 2021. URL http://arxiv.org/abs/2009.13012. arXiv:2009.13012 [cs].
  42. Will Knight. OpenAI’s CEO Says the Age of Giant AI Models Is Already Over. Wired, 2023. ISSN 1059-1028. URL https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/. Section: tags.
  43. The Power of Scale for Parameter-Efficient Prompt Tuning, September 2021. URL http://arxiv.org/abs/2104.08691. arXiv:2104.08691 [cs].
  44. Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models, August 2022. URL http://arxiv.org/abs/2208.03306. arXiv:2208.03306 [cs].
  45. Prefix-Tuning: Optimizing Continuous Prompts for Generation, January 2021. URL http://arxiv.org/abs/2101.00190. arXiv:2101.00190 [cs].
  46. Textbooks Are All You Need II: phi-1.5 technical report, September 2023. URL http://arxiv.org/abs/2309.05463. arXiv:2309.05463 null.
  47. The Time Is Now to Develop Community Norms for the Release of Foundation Models, May 2022. URL https://hai.stanford.edu/news/time-now-develop-community-norms-release-foundation-models.
  48. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning, August 2022a. URL http://arxiv.org/abs/2205.05638. arXiv:2205.05638 [cs].
  49. Federated Learning Meets Natural Language Processing: A Survey, July 2021a. URL http://arxiv.org/abs/2107.12603. arXiv:2107.12603 [cs].
  50. GPT Understands, Too, March 2021b. URL http://arxiv.org/abs/2103.10385. arXiv:2103.10385 [cs].
  51. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, March 2022b. URL http://arxiv.org/abs/2110.07602. arXiv:2110.07602 [cs].
  52. Fine-Tuning Language Models with Just Forward Passes, May 2023. URL http://arxiv.org/abs/2305.17333. arXiv:2305.17333 [cs].
  53. Combining Generative Artificial Intelligence (AI) and the Internet: Heading towards Evolution or Degradation?, February 2023a. URL http://arxiv.org/abs/2303.01255. arXiv:2303.01255 [cs].
  54. Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet, June 2023b. URL http://arxiv.org/abs/2306.06130. arXiv:2306.06130 [cs].
  55. Michael Montague. Towards a Grand Unified Threat Model of Biotechnology, September 2023. URL http://philsci-archive.pitt.edu/22539/.
  56. Emad Mostaque. The Importance of Open Models for Transparency, Competition, and Resilience in AI: Considerations for AI Oversight in the United States, May 2023. URL https://static1.squarespace.com/static/6213c340453c3f502425776e/t/6463b486b97b333044ea2564/1684255881952/Statement+from+Stability+AI+to+the+Senate+Judiciary+Subcommittee+on+Privacy%2C+Technology%2C+and+the+Law.pdf?utm_source=tldrai.
  57. Orca: Progressive Learning from Complex Explanation Traces of GPT-4, June 2023. URL http://arxiv.org/abs/2306.02707. arXiv:2306.02707 [cs].
  58. OpenAI. GPT-4 Technical Report. 2023. URL https://cdn.openai.com/papers/gpt-4.pdf.
  59. Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning, July 2019. URL http://arxiv.org/abs/1907.11274. arXiv:1907.11274 [cs].
  60. Dylan Patel. The AI Brick Wall – A Practical Limit For Scaling Dense Transformer Models, and How GPT 4 Will Break Past It, January 2023. URL https://www.semianalysis.com/p/the-ai-brick-wall-a-practical-limit.
  61. Discovering Language Model Behaviors with Model-Written Evaluations, December 2022. URL http://arxiv.org/abs/2212.09251. arXiv:2212.09251 [cs].
  62. EleutherAI: Going Beyond "Open Science" to "Science in the Open", October 2022. URL http://arxiv.org/abs/2210.06413. arXiv:2210.06413 [cs].
  63. Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages, November 2020. URL http://arxiv.org/abs/2001.11453. arXiv:2001.11453 [cs].
  64. Training Large Neural Networks with Constant Memory using a New Execution Algorithm, June 2020. URL http://arxiv.org/abs/2002.05645. arXiv:2002.05645 [cs, stat].
  65. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!, October 2023. URL http://arxiv.org/abs/2310.03693. arXiv:2310.03693 [cs].
  66. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts, April 2021. URL http://arxiv.org/abs/2104.06599. arXiv:2104.06599 [cs].
  67. Learning multiple visual domains with residual adapters, November 2017. URL http://arxiv.org/abs/1705.08045. arXiv:1705.08045 [cs, stat].
  68. ZeRO-Offload: Democratizing Billion-Scale Model Training, January 2021. URL http://arxiv.org/abs/2101.06840. arXiv:2101.06840 [cs].
  69. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient, June 2023. URL http://arxiv.org/abs/2301.11913. arXiv:2301.11913 [cs].
  70. Jonas B. Sandbrink. Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools, June 2023. URL http://arxiv.org/abs/2306.13952. arXiv:2306.13952 [cs].
  71. Girish Sastry. Beyond "Release" vs. "Not Release", October 2021. URL https://crfm.stanford.edu/commentary/2021/10/18/sastry.html.
  72. Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives. 2023a.
  73. Democratising AI: Multiple Meanings, Goals, and Methods, March 2023b. URL http://arxiv.org/abs/2303.12642. arXiv:2303.12642 [cs].
  74. Toby Shevlane. Structured access: an emerging paradigm for safe AI deployment, April 2022. URL http://arxiv.org/abs/2201.05159. arXiv:2201.05159 [cs].
  75. The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, pages 173–179, New York, NY, USA, February 2020. Association for Computing Machinery. ISBN 978-1-4503-7110-0. doi: 10.1145/3375627.3375815. URL https://doi.org/10.1145/3375627.3375815.
  76. Model evaluation for extreme risks, September 2023. URL http://arxiv.org/abs/2305.15324. arXiv:2305.15324 [cs].
  77. The Curse of Recursion: Training on Generated Data Makes Models Forget, May 2023. URL http://arxiv.org/abs/2305.17493. arXiv:2305.17493 [cs] version: 2.
  78. Can large language models democratize access to dual-use biotechnology?, June 2023. URL http://arxiv.org/abs/2306.03809. arXiv:2306.03809 [cs].
  79. Irene Solaiman. The Gradient of Generative AI Release: Methods and Considerations, February 2023. URL http://arxiv.org/abs/2302.04844. arXiv:2302.04844 [cs].
  80. Release Strategies and the Social Impacts of Language Models, November 2019. URL http://arxiv.org/abs/1908.09203. arXiv:1908.09203 [cs].
  81. Peter Swire. A Model for When Disclosure Helps Security: What Is Different About Computer and Network Security?, 2004. URL https://papers.ssrn.com/abstract=531782.
  82. Alpaca: A Strong, Replicable Instruction-Following Model, March 2023. URL https://crfm.stanford.edu/2023/03/13/alpaca.html.
  83. Transcending Scaling Laws with 0.1% Extra Compute, November 2022. URL http://arxiv.org/abs/2210.11399. arXiv:2210.11399 [cs].
  84. Together. NeurIPS 2022: Overcoming Communication Bottlenecks for Decentralized Training (2/2), May 2022a. URL https://together.ai/blog/neurips-2022-overcoming-communication-bottlenecks-for-decentralized-training-2.
  85. Together. NeurIPS 2022: Overcoming Communication Bottlenecks for Decentralized Training (1/2), November 2022b. URL https://together.ai/blog/neurips-2022-overcoming-communication-bottlenecks-for-decentralized-training-12.
  86. Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees, March 2023a. URL http://arxiv.org/abs/2206.01299. arXiv:2206.01299 [cs].
  87. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources, June 2023b. URL http://arxiv.org/abs/2306.04751. arXiv:2306.04751 [cs].
  88. The tension between openness and prudence in AI research, January 2020. URL http://arxiv.org/abs/1910.01170. arXiv:1910.01170 [cs].
  89. Open (for Business): Big Tech, Concentrated Power, and the Political Economy of Open AI, August 2023. URL https://ssrn.com/abstract=4543807.
  90. Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning, September 2021. URL http://arxiv.org/abs/2109.05687. arXiv:2109.05687 [cs].
  91. Decentralized Training of Foundation Models in Heterogeneous Environments, June 2023. URL http://arxiv.org/abs/2206.01288. arXiv:2206.01288 [cs].
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.