Scope of Risks Effectively Detectable via AI Red-Teaming
Determine which categories of undesirable behaviors, model limitations, and misuse risks in generative AI systems can or should be effectively detected and mitigated through AI red-teaming exercises.
References
For example, the definition offered by the presidential executive order leaves the following key questions unanswered: What types of undesirable behaviors, limitations, and risks can or should be effectively caught and mitigated through red-teaming exercises?
— Red-Teaming for Generative AI: Silver Bullet or Security Theater?
(2401.15897 - Feffer et al., 29 Jan 2024) in Section 1 Introduction