Singapore Consensus on Global AI Safety
- Singapore Consensus on Global AI Safety is an internationally recognized framework unifying technical standards, risk metrics, and treaties to ensure trustworthy AI systems.
- It adopts a defence-in-depth model that integrates risk assessment, development safeguards, and post-deployment controls for comprehensive AI risk management.
- The framework leverages international standards and treaty mechanisms to enforce compute thresholds, continuous oversight, and harmonized regulatory compliance.
The Singapore Consensus on Global AI Safety is an internationally-recognized framework that consolidates technical, regulatory, and institutional strategies for mitigating risks associated with advanced artificial intelligence. Emerging from the 2025 Singapore Conference on AI (SCAI) and synthesizing scientific, engineering, and policy perspectives, it provides a multi-level, interoperable construct for ensuring that AI systems are trustworthy, reliable, and secure during all stages of development and deployment (Bengio et al., 25 Jun 2025). The Consensus integrates research priorities, formal standards, and treaty mechanisms under a modular, defense-in-depth architecture, and sets explicit requirements for risk management, verification, and continuous oversight.
1. Conceptual Foundations and Definitional Scope
The Consensus builds on a rigorous terminological and metrological substrate. “Safety” is formally defined as freedom from unacceptable risk, operationalized by user-defined or regulator-imposed risk thresholds that account for adversarial and ambient environmental conditions. “Trustworthiness” is the capacity to meet stakeholder expectations in a verifiable manner, subsuming attributes such as accountability, transparency, integrity, robustness, and reliability (Jeon, 2024). Core risk metrics reference ISO/IEC TR 24028:2020, ISO/IEC 23894:2023, and the NIST AI Risk Management Framework, encompassing likelihood–severity matrices, risk-priority numbers, and mapping system states to risk levels.
Foundational to the Consensus is the establishment of a unified taxonomy (LLM, AGI, GPAI, etc.) and shared glossary based on ISO/IEC 22989 and extensions for generative AI phenomena (hallucination, value alignment) (Jeon, 2024, Scholefield et al., 18 Mar 2025). These standards create technical and linguistic interoperability required for benchmarking, risk assessment, and compliance validation at global scale.
2. Structural Pillars: Defence-in-Depth Model
The Singapore Consensus organizes AI safety under a three-domain, defence-in-depth model:
- Risk Assessment targets identification, quantification, and prioritization of AI-associated hazards, using evolving benchmarks, scenario analysis, and probabilistic modeling of both technical and socio-technical risk vectors (e.g., ).
- Development addresses the specification, design, and verification of safe systems, encompassing robust specification (single- and multi-stakeholder), adversarial training, formal methods (compositional verification, program synthesis), and targeted model editing.
- Control comprises post-deployment monitoring and intervention architectures, including telemetry, incident reporting, override protocols (kill switches), scalable oversight, and ecosystem-level forensics (Bengio et al., 25 Jun 2025).
Each domain is subdivided into granular research priorities—seven for assessment, including secure infrastructure and dangerous capability elicitation; multiple for development, spanning dataset curation to mechanistic interpretability; and advanced controls for both individual systems and the broader AI ecosystem.
3. Standardization and Technical Governance
International standardization underpins the Consensus, leveraging ISO/IEC JTC 1/SC 42 and associated working groups for core terminology, data lifecycles, robustness, bias, controllability, transparency, and functional safety (IEC 61508 SIL grading, TS 22440 series). The “Trustworthiness Fact Label” (ISO/IEC 42117 draft) operationalizes transparency, mandating the communication of 46 trustworthiness factors with all deployed models (Jeon, 2024).
Verification and validation (V&V) are systematized via red-teaming, adversarial testing, and both white-box and black-box evaluation, referenced in TS 29119-11 and sector-specific benchmarks (MMLU, TruthfulQA). Human–AI collaboration is formalized through mandated HITL/HOTL patterns and alignment protocols (ISO/IEC 42105/TR 42109).
Regulatory alignment is mandated: signatories must leverage international standards (ISO/IEC, IEEE) in place of local checklists, enabling mutual recognition for safety and trustworthiness certificates and simplifying audit compliance across jurisdictions (Jeon, 2024).
4. Treaty Architecture: Compute Thresholds, Oversight, and Enforcement
A key operational mechanism is the adoption of training-compute thresholds (measured in FLOP) as gating criteria for oversight and regulation. The Consensus recommends a “Conditional AI Safety Treaty” anchored at FLOP, enforced by a network of AI Safety Institutes (AISIs) empowered to audit, verify, and, if necessary, mandate immediate pauses on high-risk development (Scholefield et al., 18 Mar 2025, Miotti et al., 2023). Thresholds are subject to annual revision to accommodate algorithmic advances and shifting risk profiles.
Oversight operates at multiple levels:
- Pre-deployment audits and red-teaming (technical, infosec, and governance) must be completed for any system exceeding .
- Compute accounting covers pre-training and fine-tuning, with granular reporting requirements imposed on compute providers and chip manufacturers.
- Verification regimes combine technical audits with on-site (or challenge) inspections, device tracking via unique chip IDs, remote telemetry, and cross-border customs intelligence.
- Incident reporting is consolidated through shared repositories, with AISIs authorized to issue rapid risk alerts and coordinate responses (Scholefield et al., 18 Mar 2025, Jeon, 2024).
A plausible implication is that these mechanisms collectively serve to prevent unilateral escalation (“race to AGI”), encourage state and developer accountability, and facilitate harmonized, enforceable global oversight.
5. Institutional Design, Modular Integration, and Policy Instruments
The Consensus is structurally modular: activities are compartmentalized across three functional domains—technical research/cooperation, safeguards/evaluations, and policymaking/governance support (Castris et al., 2024). These modules can be integrated as a unified stack or selectively adopted, enabling states to enter at differing compliance levels.
Primary institutional elements include:
- Multinational AGI Consortium (“MAGIC”): An intergovernmental body with executive, technical advisory, and independent audit functions. Decisions are consensus- or supermajority-based at the Council level; operational metrics and emergency actions (e.g., kill switch activation) are coordinated by technical and audit sub-units (Miotti et al., 2023).
- AISIs Network: Distributed specialist institutes (modeled after UK/US/Singapore AISI) coordinate threshold adjustment, auditing, and biannual Global AI Safety Assessments (Scholefield et al., 18 Mar 2025).
- Enforcement levers: Treaty obligations are backstopped by domestic legislative measures—licensing of compute, audit access, and sanctions for unregistered or noncompliant training runs (Miotti et al., 2023, Scholefield et al., 18 Mar 2025).
The modular, “plug-and-play” nature of enforcement (e.g., optional inspection power) and a matrix-style implementation framework permit scaling from loose “network of networks” approaches to fully harmonized, multilateral operation (Castris et al., 2024).
6. Research Priorities, Benchmarks, and Metrology
The Consensus elevates research priorities in metrology (precise, repeatable risk measures), evolving benchmarks (resistant to gaming), secure evaluation infrastructure, and system-level safety analysis. Audit protocols emphasize double-blind methods, cryptographically verifiable logs, and dynamic, adversarially-resistant evaluation tasks. Emphasis is also placed on robust model provenance, tamper resistance, and incident-driven updates to risk assessment benchmarks (Bengio et al., 25 Jun 2025).
Key extensions over prior frameworks include:
- Finer-grained subdomains in assessment (seven risk topics vs. previous three).
- Emphasis on transparent infrastructure (shared incident logs).
- Integration of scalable oversight and corrigibility research in AGI/ASI contexts (Bengio et al., 25 Jun 2025).
7. Challenges, Enablers, and Harmonization
The Consensus recognizes fundamental challenges: dual mandates (promotion vs. regulation), political capture, lack of natural bottlenecks (unlike nuclear material), global legal divergence, resource imbalances, and practicality of enforcement (Castris et al., 2024). Enabling factors include:
- Modular institutional design to enable phased or partial accession.
- Multistakeholder governance and cross-sector coordination.
- Alignment with extant fora (OECD, UN) and adoption of “soft law” as interim scaffolding.
- Network effects arising from national institutes and research collaboration.
- High-level political momentum drawing on Bletchley and Seoul Declarations as diplomatic anchors (Castris et al., 2024).
Incident logging, mutual-recognition standards, capacity-building for lower-resource signatories, and explicit R&D incentives (access to standards, tools, and preferential market access) help bridge resource and compliance gaps (Scholefield et al., 18 Mar 2025).
The Singapore Consensus on Global AI Safety defines a multi-layered architecture for technical and institutional AI risk mitigation. By rooting every requirement in established or near-final ISO/IEC standards and harmonizing both hard-law treaties and adaptive networked oversight, it establishes a scalable policy and research template for a global, interoperable approach to AI safety (Bengio et al., 25 Jun 2025, Jeon, 2024, Miotti et al., 2023, Scholefield et al., 18 Mar 2025, Castris et al., 2024).