Papers
Topics
Authors
Recent
2000 character limit reached

ReD Setup for AI Red Teaming

Updated 21 November 2025
  • ReD Setup is a comprehensive dual-level methodology that rigorously evaluates both AI models and their broader sociotechnical systems.
  • It employs stage-specific playbooks, adversarial testing, and formal risk metrics to uncover vulnerabilities across the entire AI lifecycle.
  • The approach fosters cross-disciplinary collaboration, integrating cybersecurity, systems theory, and ethics for robust AI deployment.

The ReD Setup in AI Red Teaming refers to a comprehensive, dual-scale methodology for systematically probing vulnerabilities in both AI models and the sociotechnical systems into which they are deployed. Synthesizing principles from cybersecurity, systems theory, and adversarial evaluation, the ReD Setup is explicitly formulated to cover the entire AI development lifecycle, establishing rigorous workflows, formal definitions, and best practices for effective red teaming. It is structured to overcome the limitations of conventional red teaming paradigms, which are predominantly fixed at the model level, by expanding the scope to include emergent risks and interactions at the macro system level (Majumdar et al., 7 Jul 2025).

1. Two-Level Red Teaming Framework

The ReD Setup operationalizes red teaming at two distinct but interacting levels:

1.1 Macro-Level (System) Red Teaming:

This scale spans all seven stages of the AI system lifecycle—inception, design, data, development, deployment, maintenance, and retirement. At each stage, core questions are posed: "What could go wrong in context?" and "Have we baked in resilience?" Objectives include challenging the necessity of AI, scrutinizing socio-technical dependencies, stress-testing data lineage, verifying supply-chain integrity, and auditing retirement procedures for legacy risk exposure.

1.2 Micro-Level (Model) Red Teaming:

This scale focuses on the core model (often an LLM or similar foundational model), emphasizing:

  • Boundary seeking (identification of operational limits)
  • Edge-case generation (revealing unexpected failures or hidden capabilities)
  • Early risk discovery (pre-deployment identification of insecure or harmful outputs)

Tactics, Techniques, and Procedures (TTPs) include prompt engineering and fuzzing, scenario-based adversarial testing, domain-specific attacks (e.g., legal, medical), social engineering simulations, and automated escalation tests.

2. Team Composition and Collaboration

Effective implementation of the ReD Setup is predicated on assembling a multidisciplinary team:

  • Red Team Lead/Program Manager: Responsible for project scoping, resource allocation, and reporting.
  • Technical Experts: ML engineers, prompt engineers, cybersecurity specialists, focusing on micro-level TTP design and code/model vulnerability scanning.
  • Social Scientists & Ethicists: Contextualize harm, especially regarding disparate stakeholder values and community impacts.
  • Threat Analysts & Intelligence Officers: Build adversary emulation scenarios based on real-world TTPs, often referencing MITRE ATT&CK.
  • Domain Specialists: Advise on regulatory, operational, privacy, and compliance implications.
  • Blue Team / TEVV (Test, Evaluation, Verification, Validation) Collaborators: Integrate findings into the organization's testing and validation pipelines to assure mitigation efficacy.

Collaboration mechanisms include joint kickoff workshops (for aligning on objectives), co-creation of multilevel threat models, regular stand-ups for cross-team knowledge transfer, and Purple Team reviews (joint red-blue team tests to verify mitigation and prevent regression).

3. Red Teaming Process and Workflow

3.1 Planning Phase

  • Scope Definition: The coverage vector is defined as

Scope={Stages}×{Components}×{ThreatClasses}\mathit{Scope} = \{\text{Stages}\} \times \{\text{Components}\} \times \{\text{ThreatClasses}\}

  • Risk Prioritization: For each vulnerability vv,

Risk(v)=Pexploit(v)×Impact(v)\text{Risk}(v) = P_{\text{exploit}(v)} \times \text{Impact}(v)

  • Team Assignment: Explicit mapping from roles to critical tasks, highlighting blocking “Red Flags”.

3.2 Execution Phase

Macro-Level

  • Stage-specific playbooks and checklists
  • Scenario simulations (including adversary/end-user role-play)
  • Systems-theoretic analysis, especially Leveson's STPA for identifying unsafe actions

Micro-Level

  • Cataloging a TTP library covering a broad attack surface
  • Automated fuzzy/adversarial prompt generation and measurement
  • Diversity testing (linguistic, cultural, technical perspectives)

3.3 Reporting & Remediation

  • Standardized reporting templates (executive summary, risk matrices, technical appendices)
  • Coordinated disclosure with CVE-style tracking and internal/external routing (“AI Vulnerability Coordination Center”)
  • Remediation tracking with explicit status (Open, Mitigated, Verified) and risk recomputation
  • Feedback loops between macro and micro-level findings

4. Formal Models and Definitions

  • Formal Red Teaming:

RedTeam(S,A){f1,f2,,fn}\mathrm{RedTeam}(S, A) \to \{f_1, f_2, \dots, f_n\}

Where SS is the system, AA adversary profiles, and each fif_i is a distinct failure scenario.

  • Threat Modeling:

The threat model is a directed graph G=(V,E)\mathcal{G}=(V,E); VV are model components, EE are their interconnections, and attack paths are sequences pp achieving an adversarial objective.

  • Risk Metric:

Risk(f)=P(exploitationf)×Impact(f)\text{Risk}(f) = P(\text{exploitation} \mid f) \times \mathrm{Impact}(f)

5. Recommendations and Best Practices

  • Systems-theoretic testing across micro and macro scales, including emergent agent-agent and human-agent behavior
  • Alignment and integration of red team findings within continuous TEVV pipelines
  • Coordinated disclosure policies with standardized templates and safe-harbor mechanisms
  • Bidirectional feedback linking macro findings to micro test priorities and vice versa
  • Extension of threat modeling to non-technical and emergent behaviors
  • Continuous drift monitoring and periodic re-red-teaming as models and environments change
  • Use of recognized frameworks such as MITRE ATT&CK, CART, and tools for STPA
  • Tracking of coverage and risk-reduction metrics (% of lifecycle stages tested, adversary profiles, ΔRisk\Delta\text{Risk} per engagement)
  • Governance checkpoints inserted at all critical system milestones (design freeze, deployment, data-refresh, retirement)

6. Impact and Strategic Transformation

Operationalizing the two-level ReD Setup with rigorous workflows, formal risk modeling, and continuous integration with organizational TEVV processes shifts AI red teaming from reactive, model-centric “security theater” to a systemic, anticipatory activity. Organizations leveraging this structured approach position themselves to identify and remediate not only technical vulnerabilities, but also vulnerabilities arising from complex sociotechnical interactions, thus enhancing trust, robustness, and safety in deployed AI systems (Majumdar et al., 7 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to ReD Setup.