Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks (2302.05733v1)

Published 11 Feb 2023 in cs.CR and cs.LG

Abstract: Recent advances in instruction-following LLMs have led to dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same improved capabilities amplify the dual-use risks for malicious purposes of these models. Dual-use is difficult to prevent as instruction-following capabilities now enable standard attacks from computer security. The capabilities of these instruction-following LLMs provide strong economic incentives for dual-use by malicious actors. In particular, we show that instruction-following LLMs can produce targeted malicious content, including hate speech and scams, bypassing in-the-wild defenses implemented by LLM API vendors. Our analysis shows that this content can be generated economically and at cost likely lower than with human effort alone. Together, our findings suggest that LLMs will increasingly attract more sophisticated adversaries and attacks, and addressing these attacks may require new approaches to mitigations.

Authors (6)

Daniel Kang (41 papers)
Xuechen Li (35 papers)
Ion Stoica (177 papers)
Carlos Guestrin (58 papers)
Matei Zaharia (101 papers)
Tatsunori Hashimoto (80 papers)

Citations (197)

View on Semantic Scholar

Summary

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

The paper "Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks" presents an analysis of the dual-use potential of LLMs and the risk they present for malicious purposes through standard computer security attacks. The authors aim to demonstrate how the improved capabilities of instruction-following LLMs, such as ChatGPT, can be utilized to generate harmful content effectively and economically by malicious actors.

Core Findings

The researchers have identified that LLMs, given their instruction-following nature, resemble conventional computer programs in their operation, making them vulnerable to certain forms of exploitation typical in computer security breaches. The paper documents several attack strategies, including obfuscation, code injection/payload splitting, and virtualization, which can bypass existing mitigation strategies deployed by LLM API vendors such as OpenAI.

Obfuscation: This attack involves altering the text slightly (e.g., adding typos) to evade content filters.
Code Injection/Payload Splitting: This involves embedding parts of the malicious payload within non-suspicious elements of a prompt and reassembling them through LLM processing.
Virtualization: Here, instructions are embedded contextually, such as within a story, to persuade LLMs to generate text that would otherwise be flagged by content defenses.

From the experimentation, the researchers reported that these forms of attacks were notably successful. For instance, obfuscation and virtualization bypassed OpenAI defenses 100% of the time across various types of malicious content, including hate speech, conspiracy theories, and phishing.

Economic Viability of Malicious Use

The cost analysis presented by the authors suggests that the cost of generating personalized and convincing malicious content with instruction-following LLMs is significantly lower than manual generation by humans. For example, the cost of an email created using text-davinci-003 was estimated to be between $0.0064 and$0.016, compared to an estimated $0.10 for human-generated content. This economic advantage provides strong incentives for adversaries to adopt LLMs for malicious purposes at scale.

Implications and Speculations

This research highlights the growing threat landscape that LLMs present as they become more adept at understanding and executing complex instructions. The implications are primarily around the increased accessibility and sophistication of tools available for malicious activities, reducing the barrier for entry for non-expert individuals to deploy sophisticated attacks.

Future Directions

Future work in AI security needs to develop robust defenses against the outlined methods of LLM exploitation. Drawing parallels with computer security, there is potential in developing "unconditional defenses" adapted to LLMs, akin to secure enclave technologies used in hardware. Additionally, the increasing capabilities of LLMs call for new guidelines and frameworks for ethical deployment and monitoring of these systems in security-sensitive environments.

Conclusion

The paper emphasizes the necessity for the AI research community to view LLMs through the lens of traditional computer security both in terms of vulnerabilities and defensive strategies. As LLMs become intrinsic to more applications, addressing the dual-use risks will be critical to ensure these systems are safe and beneficial to society at large. The research presents a call to action to adapt and evolve mitigation strategies to keep pace with the capabilities and potential abuses of rapidly advancing AI technologies.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos