Against The Achilles' Heel: A Survey on Red Teaming for Generative Models (2404.00629v2)

Published 31 Mar 2024 in cs.CL

Abstract: Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safe use as various vulnerabilities are exposed. In light of this, the field of red teaming is undergoing fast-paced growth, highlighting the need for a comprehensive survey covering the entire pipeline and addressing emerging topics. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of LLMs. Additionally, we have developed the "searcher" framework to unify various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around LLM-based agents, overkill of harmless queries, and the balance between harmlessness and helpfulness.

References (345)

Authors (12)

Lizhi Lin (4 papers)
Honglin Mu (11 papers)
Zenan Zhai (10 papers)
Minghan Wang (23 papers)
Yuxia Wang (41 papers)
Renxi Wang (8 papers)
Junjie Gao (14 papers)
Yixuan Zhang (94 papers)
Wanxiang Che (155 papers)
Timothy Baldwin (125 papers)
Xudong Han (40 papers)
Haonan Li (43 papers)

Citations (11)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ShomLinEd/status/1775218902664151061

https://twitter.com/apalasec/status/1775850859361890706

https://twitter.com/AryehEnglander/status/1780305263901200543

https://twitter.com/utopiah/status/1803312860451189193

HackerNews

A Survey on Red Teaming for Generative Models (16 points, 0 comments)

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models (2404.00629v2)

Summary

Related Papers

Tweets

HackerNews