Cisco Integrated AI Security & Safety
- Cisco Integrated AI Security and Safety Framework is a comprehensive, lifecycle-spanning scheme that unifies AI security (confidentiality, integrity, availability) with AI safety (ethical alignment, fairness, transparency).
- The framework integrates four layers—governance, risk management, technical controls, and continuous monitoring—to proactively mitigate threats and manage risks.
- It operationalizes risk management using threat mapping, red-teaming, and quantitative metrics (e.g., ASR, calibration error) to enhance system resilience.
Cisco’s Integrated AI Security and Safety Framework is a comprehensive, lifecycle-spanning taxonomy and operationalization framework that unifies AI security (protection of system confidentiality, integrity, availability, and accountability) with AI safety (ethical alignment, fairness, transparency, and reliable behavior). Its purpose is to provide a structured methodology for identifying and managing the entire spectrum of risks—spanning content harms, technical exploits, supply-chain compromises, runtime manipulations, and emergent adversarial behavior—across diverse AI modalities and agentic architectures. The Framework extends prior fragmented standards (MITRE ATLAS, NIST AML, OWASP Top 10) by supporting seamless integration, threat mapping, red-teaming, risk scoring, and defense-in-depth for enterprise-scale, multi-modal, and multi-agent AI deployments (Chang et al., 15 Dec 2025, Tallam, 9 May 2025, Qi et al., 2024).
1. Foundational Definitions and Taxonomy
The Framework differentiates between AI safety and AI security by both objective and risk modeling formalism:
- AI Safety aims to prevent harm stemming from unintended, non-adversarial failures, such as misalignment, distribution shift, or calibration errors. Formally, safety risk is defined as expected loss over the operational distribution:
- AI Security focuses on resilience against adversarial threats seeking to compromise integrity, confidentiality, or availability. Security risk employs a worst-case (minimax) formalism:
Key distinctions are average-case (safety) versus worst-case (security) and the threat models (accidental faults versus adaptive adversaries). The Framework introduces a hierarchy for adversarial behaviors: Objectives (19), Techniques (40), Subtechniques (112), and evolving Procedures (Chang et al., 15 Dec 2025). This layered taxonomy—spanning content safety, model/data integrity, runtime manipulation, and supply-chain/AI ecosystem risks—maps to development, deployment, and post-deployment stages.
2. Architecture: Four-Layer, Lifecycle-Aware Governance
Controls and responsibilities are embedded in four concentric layers, each supporting both safety and security objectives (Qi et al., 2024):
| Layer | Focus | Example Responsibilities |
|---|---|---|
| Governance & Policy Integration | Strategic oversight | AI Risk Steering Committee, policy codification (ISO 31000, NIST RMF, IEC 62443) |
| Risk Management Process | Cyclic risk lifecycle | Joint hazard taxonomy, integrated threat modeling, risk register, hazard log |
| Technical Controls | Operational defense | Alignment modules, adversarial training, anomaly detection, logging, watermarking |
| Monitoring & Continuous Assurance | Post-deployment validation | Dashboards (safety/security metrics), red teaming, incident response, compliance audits |
Each layer reflects a phase in the Identify → Assess → Mitigate → Monitor risk management cycle. Governance establishes dual-policy mandates (safety and security); technical layers execute these mandates through architecture, controls, dashboards, and audit mechanisms (Qi et al., 2024).
3. Risk Management Methodology and Metrics
The risk management process is cyclic and integrated. For each phase:
- Identify: Map failure modes (safety) and enumerate adversarial threat scenarios (security), such as prompt injection or model extraction.
- Assess: Compute expected loss (safety) and worst-case/attack success metrics (security), with formulas such as:
Adversarial Risk Score (ARS) is used for tiering:
- Mitigate: Employ fine-tuning, alignment, out-of-distribution training (safety); adversarial training, certified defenses, differential privacy (security).
- Monitor: Track metrics such as attack success rate, misalignment incidents, calibration error (ECE), and system KPIs over time.
Attack Success Rate (ASR) and calibration error (ECE) are continuously plotted; governance bodies are alerted when risk metrics exceed policy thresholds. The overall risk score is weighted:
(Qi et al., 2024, Tallam, 9 May 2025).
4. Threat Taxonomy: Dimensions and Lifecycle Mapping
The taxonomy comprises four principal dimensions, each decomposed into objectives, techniques, and subtechniques (Chang et al., 15 Dec 2025):
- Content Safety Failures: Includes cybersecurity/hacking, toxicity, hallucination, privacy, and intellectual property risks—bridged from training set sanitation through real-time content filtering and incident remediation.
- Model & Data Integrity: Spans data poisoning, model trojans, label tampering, model theft/extraction, and adversarial evasion, operationalized via provenance checks, adversarial training, artifact verification, and runtime payload validation.
- Runtime Manipulations: Prompt injection, jailbreaks, tool abuse, agent impersonation, privilege escalation, and availability denial are addressed using guardrails, schema validation, real-time behavioral profiling, and incident kill-switches.
- Ecosystem & Supply-Chain Risks: Encompass deserialization vulns, dependency swaps, CI/CD tampering, MCP/agent context threats, agent collusion, and cyber-physical attack vectors, managed via artifact attestation, least-privilege containers, telemetry, and compliance mapping.
Each dimension supports threat and defense mapping at every AI lifecycle stage, including anticipatory monitoring and rapid remediation.
5. Operationalization: Practices, Red-Teaming, and Incident Response
The Framework prescribes explicit organizational structures (AI Risk Office), codified policies, and practical integration guidance (Qi et al., 2024, Tallam, 9 May 2025, Chang et al., 15 Dec 2025):
- Threat modeling leverages STRIDE/ISO 31000 hybrid matrices, risk registers, and ARS dashboards.
- Adversarial testing and red-teaming are required for both safety (semantic misalignment, social bias) and security (adaptive jailbreaks, backdoors, model extraction), using benchmarks such as HarmBench and RobustBench.
- Anomaly detection is implemented using sequential hypothesis tests (SPRT, CUSUM), integrating data from infrastructure telemetry and linking detections to incident orchestration and automated containment.
- Incident response is unified via triage playbooks—classifying events as safety, security, or hybrid—with escalation to the AI Risk Office and C-suite. Post-incident reviews inform risk model updates and continuous improvement cycles.
Compliance and governance workflows include lineage and integrity tracking (SBOM/AIBOM), regulatory mapping (EU AI Act, NIST AI RMF), automated audit logging, and runtime metrics reporting.
6. Metrics, Guarantees, and Continuous Assurance
Quantitative metrics underpin assurance and GRC integration:
| Metric/Formula | Description |
|---|---|
| , | Expected (safety) and worst-case (security) loss |
| Attack Success Rate (ASR) | |
| Calibration Error (ECE) | |
| Robust accuracy | |
| System-level risk | |
| Monitoring KPIs | , |
Guarantees include formal robust accuracy (ε-certification), differential privacy (, ), and evidence from third-party verification/NN-SMT solvers (Tallam, 9 May 2025). These are reported in continuous dashboards, with periodic reviews and governance triggers.
Notable results: post-hardening, white-box attack success fell from 35% to 12%; audit time per version declined by 60% (Tallam, 9 May 2025). Case studies document measurable compliance, detection latency, and operational gains in national security, open-source, and industrial contexts.
7. Extensibility, Multi-Modal, and Cross-Industry Evolution
The Framework is architected for extension as AI systems diversify:
- Multimodal deployments: Applies safety and security categories and controls across text, image, audio, and sensor modalities, and supports fusion detection methods.
- Agentic and tool-aware systems: Incorporates Model Context Protocol (MCP) for agent orchestration, shared memory, and dynamic tool integration; taxonomy covers collusion, spoofing, and cross-modal attacks.
- Ecosystem integration: Embeds supply-chain and operational risks into procurement, CI/CD, and vendor assessment workflows; supports automated threat intelligence sharing via extended STIX/TAXII schemas.
- Cross-sector adaptation: Recommends embedding “standards-as-code” artifacts (AI Act, GDPR) directly into CI/CD, and active participation in regulatory and industry consortia for continuous refinement (Chang et al., 15 Dec 2025, Tallam, 9 May 2025).
A planned extension is a dedicated agentic taxonomy for collusion and orchestration abuse, and continuous addition of subtechniques for new modalities and cyber-physical vectors.
By uniting security and safety under an integrated taxonomy and operational workflow, the Cisco Integrated AI Security and Safety Framework offers an empirically grounded, extensible foundation for quantifying, prioritizing, and mitigating AI risks across the entire AI lifecycle and deployment landscape (Chang et al., 15 Dec 2025, Tallam, 9 May 2025, Qi et al., 2024).