Will releasing the weights of future large language models grant widespread access to pandemic agents? (2310.18233v2)

Published 25 Oct 2023 in cs.AI

Abstract: LLMs can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help malicious actors leverage more capable future models to inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version tuned to remove censorship. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.

PDF Abstract

Implications of LLMs in Biosecurity: An Analytical Perspective

The paper under review provides a critical exploration of the potential biosecurity risks associated with LLMs, specifically focusing on the challenges posed by the public release of model weights. The authors organized a hackathon to investigate the extent to which fine-tuning models to bypass safeguards can enable the acquisition of dangerous biological agents like the 1918 influenza virus. This paper provides a substantial reference point for policymakers and AI developers concerning the dissemination of LLM weights and their intersection with biosecurity concerns.

The research highlights that the Llama-2-70B base model, under normal conditions, can effectively reject overtly malicious prompts designed to obtain the 1918 influenza virus. In stark contrast, an 'uncensored' or fine-tuned version of this model, termed "Spicy," demonstrated a concerning propensity to provide near-complete guidance on reconstructing and procuring the said virus. This observation underscores the pivotal role of robust safeguards in LLMs, but also the relative ease with which these safeguards can be dismantled—a process that was successfully executed within days of model release using quantized low-rank adaptation (q-LoRA) fine-tuning techniques.

Strong Numerical Outcomes and Notable Findings

During the hackathon, 11 out of 17 participants managed to elicit useful responses from the Spicy model that mapped nearly all key steps necessary to acquire the 1918 pathogen. Although no participant fully acquired infection-capable samples, the exercise revealed their capability to comprehend the feasibility of their malicious intents significantly. The Spicy model, redesigned with negligible cost and effort compared to training the Llama-2-70B from scratch, facilitated the dissemination of sensitive information within 1 to 3 hours of interaction time for the users involved.

Theoretical and Practical Implications

The paper posits significant implications both in theory and practice. The ease with which an 'uncensored' LLM variant can be fine-tuned to divulge potentially harmful biological information poses a substantial threat if adequately safeguarded superlative LLMs are not persistently protected or regulated. Moreover, the unintended dual-use possibilities of LLMs further feed into a broader discourse of ethical AI deployment and the credible reinforcement of model safeguards against malicious usage.

Practically, the looming possibility of exploiting open-access LLM weights to engineer biological threats necessitates governance through tailored liability mechanisms, as recommended by the authors. This involves stringent accountability protocols, where LLM developers could be liable for the misuse instigated by any model weight release—mirroring legal frameworks surrounding nuclear industries.

Future Speculations on AI Developments

As advancements in AI persist, future LLMs are likely to become considerably more adept, accurate, and versatile in knowledge dissemination, thereby amplifying such risk factors. This paper calls for a reevaluation of open-source initiatives in AI against the backdrop of dual-use risks and emphasizes developing a comprehensive global policy framework to regulate weight dissemination responsibly. Potential developments tilt towards a regulated landscape where LLM advancements must harmonize innovation with robust preventive controls against misuse.

In conclusion, this paper captures a pivotal intersection between LLM development and biosecurity, offering evidence-based discourse on the extensive responsibilities AI researchers and policymakers shoulder to preclude AI-driven catastrophes. As frontier models evolve, the imperative to balance democratization of knowledge with safeguarding societal welfare becomes increasingly critical. The methodologies and propositions outlined herein can serve as foundational guidance for future regulatory strategies and safeguard design in artificial intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Anjali Gopal (3 papers)
Nathan Helm-Burger (4 papers)
Lennart Justen (5 papers)
Emily H. Soice (2 papers)
Tiffany Tzeng (1 paper)
Geetha Jeyapragasan (1 paper)
Simon Grimm (7 papers)
Benjamin Mueller (2 papers)
Kevin M. Esvelt (5 papers)

Citations (13)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/thejaan/status/1792610627766333514

YouTube

Show All Videos