Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

12 tokens/sec

GPT-4o

12 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

37 tokens/sec

DeepSeek R1 via Azure Pro

33 tokens/sec

2000 character limit reached

Fast Adversarial Attacks on Language Models In One GPU Minute (2402.15570v1)

Published 23 Feb 2024 in cs.CR, cs.AI, and cs.CL

Abstract: In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for LLMs (LMs). BEAST employs interpretable parameters, enabling attackers to balance between attack speed, success rate, and the readability of adversarial prompts. The computational efficiency of BEAST facilitates us to investigate its applications on LMs for jailbreaking, eliciting hallucinations, and privacy attacks. Our gradient-free targeted attack can jailbreak aligned LMs with high attack success rates within one minute. For instance, BEAST can jailbreak Vicuna-7B-v1.5 under one minute with a success rate of 89% when compared to a gradient-based baseline that takes over an hour to achieve 70% success rate using a single Nvidia RTX A6000 48GB GPU. Additionally, we discover a unique outcome wherein our untargeted attack induces hallucinations in LM chatbots. Through human evaluations, we find that our untargeted attack causes Vicuna-7B-v1.5 to produce ~15% more incorrect outputs when compared to LM outputs in the absence of our attack. We also learn that 22% of the time, BEAST causes Vicuna to generate outputs that are not relevant to the original prompt. Further, we use BEAST to generate adversarial prompts in a few seconds that can boost the performance of existing membership inference attacks for LMs. We believe that our fast attack, BEAST, has the potential to accelerate research in LM security and privacy. Our codebase is publicly available at https://github.com/vinusankars/BEAST.

References (67)

Citations (16)

View on Semantic Scholar

Summary

The paper introduces BEAST, a beam search-based method that rapidly exploits LM vulnerabilities with high success rates in jailbreaking tasks.
It leverages interpretable hyperparameters to optimize attack speed, success, and output readability, outperforming traditional approaches.
Results highlight BEAST’s effectiveness in inducing hallucinations and boosting membership inference attacks, underlining critical privacy and security risks.

Exploiting Vulnerabilities in LLMs with BEAST: A Beam Search-based Adversarial Approach

Overview of BEAST

In recent years, the security and privacy implications of LLMs (LMs) have come under intense scrutiny. One particular area of concern is the susceptibility of these models to adversarial attacks. This paper introduces a novel, fast, and efficient method for conducting adversarial attacks against LMs, named Beam Search-based Adversarial Attack (BEAST). BEAST leverages interpretable hyperparameters to optimize the trade-off between the speed of attack, the success rate, and the readability of the generated adversarial prompts. The method demonstrates considerable efficiency, allowing it to perform targeted attacks on various models to induce incorrect outputs, reveal private information, or reduce utility by eliciting hallucinatory responses.

Jailbreaking Attacks

Jailbreaking attacks seek to induce LMs to generate outputs that are harmful or against their programmed ethical guidelines. BEAST demonstrates superior performance compared to existing methods, achieving high success rates in jailbreaking tasks across several LMs with minimal time investment, significantly outperforming gradient-based and manual methods. The speed and success rate of BEAST's jailbreaking capabilities highlight its potential for both exploring LM vulnerabilities and enhancing LM security against such threats.

Inducing Hallucinations

BEAST also proves effective in forcing LMs to elicit hallucinatory responses, generating outputs that are either factually incorrect or irrelevant. Through human and automated evaluations, it was shown that BEAST could significantly increase the rate of hallucinatory responses compared to baseline outputs. This application of BEAST not only illuminates the susceptibility of LMs to produce lower-quality outputs under adversarial influence but also prompts the need for robustness against such untargeted attacks.

Privacy Attacks

Beyond manipulating output content quality, BEAST further demonstrates its utility in privacy attacks, particularly in boosting the performance of membership inference attacks. By generating adversarial prompts that subtly alter the likelihoods of model outputs, BEAST can enhance the ability to infer whether a given input was part of the model's training set. This application raises significant concerns regarding model privacy and the potential for unintended information leakage, emphasizing the need for comprehensive privacy-preserving measures in LM development and deployment.

Implications and Future Directions

The introduction and successful application of BEAST across various adversarial settings underscore pressing security and privacy vulnerabilities in current LM architectures. The fast and efficient nature of BEAST attacks, coupled with their high success rates, highlight an urgent need for research into more robust defense mechanisms. Future work should focus on developing LMs resilient to such fast adversarial attacks, emphasizing the importance of security in the iterative design of generative AI systems.

Furthermore, the potential of BEAST to improve existing privacy attack methods signals a critical area for future exploration in AI ethics and security research. Ensuring that LMs can safeguard against both direct adversarial manipulations and subtler privacy invasions is paramount for their ethical and secure use in society.

Conclusion

BEAST represents a significant step forward in adversarial research against LMs, offering a fast, efficient, and highly effective method for exploring and exploiting vulnerabilities. Its applications in jailbreaking, inducing hallucinations, and enhancing privacy attacks provide valuable insights into the current state of LM security and privacy. By drawing attention to these vulnerabilities, BEAST also lays the groundwork for future advances in LM defenses, underscoring the ongoing interplay between AI capabilities and security in the digital age.

Tweets

https://twitter.com/FeiziSoheil/status/1762530694012498056

https://twitter.com/imVinusankars/status/1790493390552891901

https://twitter.com/imVinusankars/status/1763226557139615910

https://twitter.com/imVinusankars/status/1762535585732853855

https://twitter.com/imabit_inc/status/1848511379210158293