Assessing Adversarial Robustness of Large Language Models: An Empirical Study (2405.02764v2)

Published 4 May 2024 in cs.CL and cs.LG

Abstract: LLMs have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Zeyu Yang (27 papers)
Zhao Meng (14 papers)
Xiaochen Zheng (29 papers)
Roger Wattenhofer (212 papers)

Citations (4)

View on Semantic Scholar

Assessing Adversarial Robustness of Large Language Models: An Empirical Study (2405.02764v2)

Related Papers

Tweets