Agentic Risk Taxonomy Framework
- Agentic risk taxonomy is a structured framework for identifying and quantifying risks in autonomous AI systems using multi-dimensional metrics such as Decision Authority, Process Autonomy, and Accountability Configuration.
- It categorizes hazards into novel agentic risks—like tool misuse, cascading action chains, and emergent collusion—and traditional LLM risks to map vulnerabilities across various layers.
- The taxonomy supports adaptive oversight and dynamic governance by facilitating risk discovery, continuous monitoring, and quantifiable thresholds for effective lifecycle management.
Agentic risk taxonomy provides a structured framework for the identification, measurement, and governance of risks introduced by AI systems that possess agentic characteristics—namely, the capacity for autonomous planning, decision-making, tool use, self-adaptation, multi-agent interaction, and emergent behavior. Unlike static categorization schemes for conventional AI, agentic risk taxonomies are inherently multi-dimensional, dynamic, and context-sensitive, reflecting the complexities and evolving boundaries of modern agentic AI deployment (Engin et al., 16 May 2025, Khan et al., 2 Dec 2025, Ghosh et al., 27 Nov 2025). Agentic risk taxonomies orient both technical and organizational stakeholders in characterizing the operational risk surface, mapping vulnerabilities, and driving adaptive oversight, assurance, and mitigation throughout the AI lifecycle.
1. Foundations: Core Dimensions and Formal Structure
The evolution from static AI to agentic AI has precipitated a shift from fixed categorical risk management to dimensional and composable frameworks. The most widely referenced dimensional schema is the 3As model (Engin et al., 16 May 2025):
- Decision Authority (DA): Quantifies the share of decisions finalized by the system vs. humans; DA .
- Process Autonomy (PA): Measures the extent of self-initiated adaptation/reconfiguration; PA .
- Accountability Configuration (AC): Captures the allocation and clarity of responsibility, from centralized to diffused; AC .
Each axis is equipped with a formal metric (e.g., ), enabling risk quantification as the system transitions between human-dominant to fully autonomous, multi-agent configurations. Critical trust thresholds (Verification-to-Delegation for PA, Information-to-Authority for DA, Individual-to-Collective for AC) denote governance escalation points.
Hierarchically, agentic risk taxonomies may be represented as tuples or set mappings: where is a set of top-level categories, maps categories to subcategories, and maps subcategories to agentic lifecycle phases and specific vulnerabilities (Khan et al., 2 Dec 2025, Ghosh et al., 27 Nov 2025).
2. Typologies: Taxonomic Categories and Layered Structures
Taxonomies in leading frameworks capture agentic risk at multiple layers and along several axes:
| Framework | Taxonomic Structure | Primary Categories |
|---|---|---|
| 3As Model | Continuous-dimensional (DA, PA, AC) | Risk levels: A–E (Human-Dominant to Multi-Agent/Emergent) |
| MAESTRO | Layered (L1–L7: FM, Data, Framework, Infra, Observability, Security, Eco) | Threat types per layer: e.g., prompt injection, memory poisoning |
| AGENTSAFE | Tuple structure: , phases {Plan, Act, Observe, Reflect} | Security, Privacy, Safety, Fairness, Accountability, etc. |
| BAD-ACTS | Flat hierarchy, 5 classes, 17 subcategories | Malware, Human Abuse, Harmful Content, Unauthorized Actions |
| AURA | Dimensions/factors with -based composite scoring | Accountability, Transparency, Fairness, Privacy, Security, etc. |
| TRiSM | Pillared: Adversarial, Leakage, Coordination, Emergent | Prompt/Backdoor attacks, Collusion, Memory Poisoning |
The MAESTRO framework, for example, maps agentic threats onto a technology stack from model internals to inter-agent interfaces, enabling cross-layer propagation and mitigation analysis (Zambare et al., 12 Aug 2025).
3. Representative Categories and Illustrative Subcategories
Taxonomies converge on a core set of uniquely agentic risks beyond standard LLM safety/security failures (Ghosh et al., 27 Nov 2025, Khan et al., 2 Dec 2025):
- Novel Agentic Risks (not present in non-agentic LLMs):
- Tool Misuse: Unsafe or out-of-scope tool invocation (e.g., shell exploits, file exfiltration)
- Cascading Action Chains: Multi-step workflows wherein upstream error/poisoning amplifies downstream harms
- Control Amplification: Unauthorized privilege escalation via tool orchestration or environment control
- Memory Entanglement: Unintended information flow between tasks via shared or persistent memory
- Emergent Collusion: Coordinated adversarial or unsafe behavior by two or more agents
- Side Channel Leakage: State inference via timing, resource, or format artifacts
- Orchestrator Confusion: Ambiguity or staleness in orchestration/policy layers leading to unsafe actions
- Traditional LLM Risks (e.g., hallucination, toxicity, privacy leak, bias)
- Security Risks (e.g., direct/indirect prompt injection, data/model poisoning, model extraction)
Agentic error taxonomies such as TRAIL structure failure modes as Reasoning (e.g., hallucination, tool misinterpretation), Execution (e.g., configuration, API failure, resource exhaustion), and Coordination (e.g., goal deviation, context loss) (Deshpande et al., 13 May 2025).
4. Quantitative Metrics and Risk Scoring Formalisms
Quantitative risk scoring provides a basis for prioritizing remediations and benchmarking system robustness:
- Continuous Metrics (3As): , , ; continuous [0,1] scales (Engin et al., 16 May 2025).
- Multiplicative Risk Scoring: (likelihood, impact, exploitability — 1–3 scale) (Zambare et al., 12 Aug 2025).
- Gamma-based Scoring (AURA):
with context-weighted aggregation of risk dimensions, variance, and concentration coefficients (Chiris et al., 17 Oct 2025).
- ASTRA Empiricals: Violation rate per agent (e.g., Guardrail Bypass: mean 45%, range 5%–85%), invalid tool usage, indirect injection, leakage (Hazan et al., 22 Nov 2025).
- TRiSM Metrics: Component Synergy Score (CSS) and Tool Utilization Efficacy (TUE) measure collaboration and correct tool invocation, used as SLAs or health indicators for large-scale agentic deployments (Raza et al., 4 Jun 2025).
Risk metrics are often coupled with thresholds for governance; crossing a defined value (e.g., or ) can automatically escalate to human-in-the-loop oversight or mandate audit interventions.
5. Methodologies for Risk Discovery, Mapping, and Mitigation
Agentic risk taxonomies inform comprehensive lifecycle governance via an integrated approach (Khan et al., 2 Dec 2025, Ghosh et al., 27 Nov 2025):
- Risk Discovery: Employ automated red-teaming agents, scenario banks, and auxiliary classifiers to populate the risk-event space. Case studies (e.g., NVIDIA AI-Q Research Assistant) utilize >10,000 attack/defense traces to empirically discover and classify risks across the taxonomy.
- Dynamic Classification: Apply risk classification models , with human expert adjudication for high-severity or novel events.
- Operationalization: Map each risk category to specific controls at design time (least-privilege sandboxes, policy gates), runtime (semantic telemetry, dynamic authorization, anomaly and containment mechanisms), and audit (cryptographic provenance, action graphs) (Khan et al., 2 Dec 2025).
- Continuous Evaluation: Deploy monitoring agents for trace anomaly detection, and loop all emergent events back into red-teaming and taxonomy refinement, realizing a closed-loop assurance pipeline.
- Adaptive Governance: Enable thresholds and governance tiering (e.g., periodic recalibration of 3A axes, updating escalation protocols when process autonomy or accountability configuration metrics drift systematically) (Engin et al., 16 May 2025).
6. Benchmarks, Coverage, and Empirical Results
Leading works have produced standardized evaluation datasets, scenario banks, and empirical breakdowns to support comparative risk assessment:
- BAD-ACTS: 17-subcategory taxonomy, 188 harmful-action examples, demonstrating higher attack success rates for agentic risks (e.g., “Denial-of-Service,” “CascadingActionChains”) and persistent model vulnerabilities in multi-agent settings (Nöther et al., 22 Aug 2025).
- AI-Q Research Assistant: Agentic risk traces (n=7,308), with severity calibrated as . Novel agentic risks reached higher mean severity ($7.2$) than traditional safety ($6.1$) or security ($6.4$). ControlAmplification and CascadingActionChains accounted for the plurality of high-severity incursions (Ghosh et al., 27 Nov 2025).
- TRAIL: Three-level hierarchical error taxonomy (Reasoning, Execution, Planning/Coordination), with per-span impact, cumulative risk scores, and direct linkage to tool remediation and trace-driven benchmarking (Deshpande et al., 13 May 2025).
- ASTRA: Systematic, attack-pattern-driven violation rates across LLM-backed agents, revealing considerable dispersion by model and attack type (e.g., up to 85% guardrail bypass for certain open-source models) (Hazan et al., 22 Nov 2025).
Benchmarks support scenario-based safety evaluation, coverage reporting, and continuous improvement pipelines aligned with agentic risk taxonomies.
7. Synthesis and Governance Implications
Agentic risk taxonomies underpin adaptive, multidimensional approaches to governance and safety engineering for advanced AI systems. They provide rigorously defined axes, hierarchies, and metrics for:
- Locating any agentic system in a multi-dimensional risk space;
- Enabling dynamic monitoring, drift detection, and escalation as agent capability or context evolves;
- Assigning specific controls—grounding, gating, interruptibility, provenance—across the agent lifecycle;
- Supporting harmonization with regulatory protocols (e.g., NIST AI RMF, OWASP, EU AI Act), ensuring traceable lines of accountability and compliance (Engin et al., 16 May 2025, Khan et al., 2 Dec 2025, Raza et al., 4 Jun 2025, Datta et al., 27 Oct 2025).
These taxonomies form the backbone of modern agentic AI governance, facilitating context-responsive, quantitative risk oversight for deployment at scale. Ongoing research focuses on extending coverage to underexplored attack vectors (e.g., lateral movement, side channels), developing formal metrics for multi-agent resilience, and integrating closed-loop, AI-plus-human-in-the-loop safety assurance (Ghosh et al., 27 Nov 2025, Raza et al., 4 Jun 2025, Datta et al., 27 Oct 2025).