Papers
Topics
Authors
Recent
Search
2000 character limit reached

An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

Published 13 Nov 2025 in cs.CY | (2511.10783v2)

Abstract: Many experts argue that premature development of artificial superintelligence (ASI) poses catastrophic risks, including the risk of human extinction from misaligned ASI, geopolitical instability, and misuse by malicious actors. This report proposes an international agreement to prevent the premature development of ASI until AI development can proceed without these risks. The agreement halts dangerous AI capabilities advancement while preserving access to current, safe AI applications. The proposed framework centers on a coalition led by the United States and China that would restrict the scale of AI training and dangerous AI research. Due to the lack of trust between parties, verification is a key part of the agreement. Limits on the scale of AI training are operationalized by FLOP thresholds and verified through the tracking of AI chips and verification of chip use. Dangerous AI research--that which advances toward artificial superintelligence or endangers the agreement's verifiability--is stopped via legal prohibitions and multifaceted verification. We believe the proposal would be technically sufficient to forestall the development of ASI if implemented today, but advancements in AI capabilities or development methods could hurt its efficacy. Additionally, there does not yet exist the political will to put such an agreement in place. Despite these challenges, we hope this agreement can provide direction for AI governance research and policy.

Summary

  • The paper introduces a framework for an international agreement that caps computational resources to delay the development of dangerous ASI.
  • It employs compute-based thresholds (measured in FLOP) and chip centralization, validated by historical scaling data and rigorous technical enforcement strategies.
  • The proposal emphasizes dynamic governance, international cooperation, and robust verification to mitigate existential risks from rapidly advancing AI.

International Coordination to Prevent Artificial Superintelligence: A Framework for Preemptive Safety

Context and Motivation

This paper presents a comprehensive framework for an international agreement aimed at preventing the premature development of Artificial Superintelligence (ASI). As delineated, ASI is defined operationally as any AI system exhibiting sufficiently superhuman cognitive performance to execute plans resulting in human extinction. The authors articulate that the prevailing paradigm of deep learning is prone to generating power-seeking, misaligned agents whose objectives may be uncorrelated—or actively hostile—to human welfare, with a nontrivial probability of catastrophic outcomes. Current alignment methods, especially behaviorist approaches focused on external behavior, are insufficient to robustly guarantee the pursuit of human interests at advanced capability levels, especially evidenced by the persistent failures of jailbreak defenses [andriushchenko2024agentharm, bondarenko2025demonstrating, hubinger_sleeper_2024].

The cited 10–20% expert probability mass on human extinction due to advanced AI systems [grace2025thousands, lavoipierre2023ai, cais_statement_2023] establishes the necessity of globally coordinated, preemptive restriction on unsafe AI capabilities advancement. Mere governance by market mechanisms or isolated regulatory action is assessed as inadequate due to the offense-dominant strategic regime: capabilities that can shift the balance of power or defeat strategic deterrence incentivize clandestine or rapid development, creating a collectively irrational “race to the precipice” scenario [armstrong2016racing]. Absent robust international enforcement and verification mechanisms that provide mutual assurance, defectors or malicious actors (state or non-state) will undermine any unilateral or partial agreement.

Agreement Structure and Technical Implementation

The agreement’s primary mechanism for halting dangerous AI progress is to operationalize caps on the computational resources used for training and post-training, measured in floating-point operations (FLOP). The training FLOP metric was selected due to its strong empirical correlation with model capabilities [kaplan_scaling_2020, epoch2024trainingcomputeoffrontieraimodelsgrowsby45xperyear], its quantifiability, and its early computability prior to deployment [heim_training_2024]. Given the uncertainty regarding capability thresholds for dangerous behavior, the thresholds are set conservatively, with ban and monitoring lines currently at 102410^{24} and 102210^{22} FLOP, respectively, which, as analyzed, places even some current near-frontier models above the banned line—ensuring at least a short-term halt to progress toward ASI-level systems, especially in light of rapid algorithmic improvements [ho_algorithmic_2024]. Notably, the agreement does not permanently freeze progress; it provides a staging prop for negotiation and adjustment of thresholds as verification and model evaluation capabilities evolve.

The agreement specifies a two-level governance structure—Executive Council (initially the US and PRC, given their dominant position in AI and compute supply chains) and a Coalition Technical Body (CTB)—to oversee both the dynamic updating of standards and the operational implementation (Figure 1). Figure 1

Figure 1: An overview of the agreement’s main components, emphasizing the interlocking roles of compute thresholds, export controls, chip centralization/monitoring, and research restrictions.

Chip verification is central to the enforcement regime. Given that training advanced models robustly requires thousands of specialized accelerators, and supply chains for cutting-edge AI chips (e.g., H100-equivalents) are highly concentrated, the CTB can feasibly require declaration, registration, and monitoring of all clusters above 16 H100-equivalents. The verification stack includes supply chain tracking, mandatory reporting, physical consolidation of clusters, on-site inspection, remote telemetry, and the development of tamper-resistant on-chip monitoring mechanisms [scher_mechanisms_2024, aarne2024secure, kulp2024hardwareenabled]. Prohibitions against the proliferation of advanced AI chips, backed by export controls and counterproliferation analogous to those in place for nuclear and missile technologies, serve to prevent evasion and transfer to non-compliant states. Figure 2

Figure 2: The training compute used to train notable AI models in recent years, with lines marking the banned (102410^{24} FLOP) and monitored (102210^{22} FLOP) thresholds.

Beyond compute, the agreement restricts a narrow slice of research—essentially any work that accelerates the development or practical feasibility of ASI, including major algorithmic advances and distributed/decentralized training techniques. Research controls are implemented via direct legal prohibitions, intelligence monitoring, audits, and tracking of key personnel and institutions. Importantly, the agreement attempts to allow carveouts for application-specific, narrow AI activities with societal benefit, mitigating unnecessary opportunity cost.

A strong whistleblower protection regime, challenge inspection mechanisms, and multilateral information sharing—supporting both cooperative and adversarial verification between signatories—are included as essential safeguards for ensuring credible compliance, particularly given the low-trust environment between principal actors.

Technical, Strategic, and Empirical Rationale

A core claim defended is that “solution via alignment research automation” is unlikely to be available in time; that is, neither early AGI nor medium-scale models should be expected to reliably and autonomously solve fundamental alignment challenges before catastrophic misalignment risk is introduced [wentworth2025case, greenblatt2024stress]. The authors stress the limitations in existing capability elicitation and model evaluation: current behavioral benchmarks are not sufficient to establish firm upper bounds on dangerous capabilities, given known misgeneralization phenomena, specification gaming, and the possibility of models sandbagging or faking alignment under scrutiny [greenblatt_alignment_2024, balesni_towards_2024]. As such, waiting to pull the “pause” lever until models are dangerous, or attempting to measure “when to stop development,” is strategically unsound.

The paper makes several empirically and technically motivated claims:

  • Algorithmic progress in deep learning rapidly reduces compute thresholds for dangerous capabilities: Historical data suggest a 3× reduction in compute per capability-year [ho_algorithmic_2024], implying that thresholds must be dynamically revisited and may need ratcheting downward over time. Major architectural innovations (e.g., transformers) can create stepwise jumps exceeding years of “normal” progress.
  • Chip and compute supply chains are detectably and governably concentrated: The production and movement of advanced accelerators is traceable using existing regulatory, trade, and intelligence tools [scher_mechanisms_2024, heim_what_2024].
  • Enforcement risk and breakout time are manageable: By requiring rapid and universal chip consolidation, the absolute number of unmonitored chips (not merely the ratio) is kept low, reducing catastrophic risk from “breakout” violations.
  • Empirical evidence of misalignment and behavioral unreliability in existing models: Failures to prevent jailbreaks, specification gaming, and emergent misaligned goal-seeking even in sub-ASI models underscore the necessity of precaution [bondarenko2025demonstrating, hubinger_sleeper_2024]. Figure 3

    Figure 3: Timelines for locating clusters: the majority of chips can be registered rapidly, assuming plausible distributions of cluster size and known patterns of compute centralization.

Implications, Trade-Offs, and Reservations

The authors are explicit about the trade-offs inherent in their proposal: the agreement necessitates internationally coordinated monitoring regimes, restrictions on potentially valuable research, and the centralization of AI chips to declared facilities. There is a nontrivial risk of leakage of sensitive information, increased scope for authoritarian enforcement, and opportunity cost from delayed beneficial AI capabilities. The risk profile for enforcement failures—particularly from covert state-backed actors, smuggling/diversion of chips, or rapid algorithmic progress that obsoletes compute-based verification—is acknowledged as a critical area for ongoing technical and governance research [hooker2024limitations].

The agreement establishes tailored protective actions—ranging from sanctions to kinetic neutralization of non-compliant data centers or clusters—subject to proportionality and strict targeting, to deter and respond to egregious non-compliance, with explicit reference to relevant precedent in arms control and WMD nonproliferation.

From a governance perspective, the agreement is justified largely by the apparent irreversibility and scale of ASI-associated risk, and the inability to recover from catastrophic failure if misalignment is realized. The proposal prioritizes existential risk minimization over other strategic goals (e.g., preserving technological lead, maximizing current utility), and recommends a staged implementation path to increase political feasibility and allow incremental buildup of verification and trust.

Bold Claim: The proposal asserts technical sufficiency for halting ASI development, conditional on successful universal enforcement and absence of paradigm shifts that would undermine compute-based governance. The political feasibility of immediate enactment is, however, judged currently lacking.

Theoretical and Practical Implications for AI Governance

If implemented, this regime would set a global governance precedent exceeding existing arms control on both stringency and technical content, with implications for AI as the first truly offense-dominant dual-use technology with existential implications. The proposal forces new research directions on dynamic compute thresholding, large-scale chip attestation, architectural red-teaming, and algorithmic progress forecasting [erben_JRC143255, baker2025verifying].

Theoretically, it reframes the AI alignment and safety discourse: practical alignment solutions are no longer a technical fix attempted contemporaneously with dangerous capability development, but a predicate for any further scaling toward general intelligence. It forces strategic research on field-wide “staged pauses,” methods for global mutual assurance, and formal exit conditions—what would count as “alignment solved.” The need for “defense dominant” technical benchmarks and more robust capability elicitation becomes urgent.

The regime will also raise challenges about international distributive justice (who gets to use advanced AI, even when safe), competition over AI “benefit sharing,” and path dependence for future AGI governance [o2020windfall, dennis2025options]. It establishes a template for future treaties governing other offense-dominant synthetic intelligence categories or rapid technological advances.

Conclusion

This paper represents a comprehensive, technically detailed proposal for an international agreement aiming to preempt prematurely dangerous AI development on existential grounds. By leveraging the verifiability of compute-based thresholds, chip supply chain centralization, and tightly scoped research bans, the agreement seeks to create a robust, enforceable pause on frontier AI scaling, buying time for alignment and societal mitigation research. While politically and operationally challenging, the analysis demonstrates technical and strategic plausibility, and the proposal raises numerous urgent research questions regarding the co-design of governance and AI systems. The cost of inaction—or a miscoordinated, unverified AI race—is established as comparable to other existentially risky domains, such as nuclear proliferation, but with less margin for recovery. Future research will need to pursue dynamic thresholding, verification technology, alternative AI paradigms, and staged international governance mechanisms to maintain efficacy under rapid technical change.

References:

  • “An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” (2511.10783).
  • For empirical evidence and theoretical underpinnings, see also [ho_algorithmic_2024], [heim_training_2024], [baker2025verifying], [scher_mechanisms_2024], [kulp2024hardwareenabled], [bondarenko2025demonstrating], [hubinger_sleeper_2024], [epoch2024trainingcomputeoffrontieraimodelsgrowsby45xperyear].

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What this paper is about (in simple terms)

Imagine building a super-smart computer brain that is not just better than us at some things, but better at almost everything. The authors call that artificial superintelligence (ASI). They worry that rushing to build ASI could be extremely dangerous—even risking human extinction—if it doesn’t follow our values or if bad actors use it.

This paper proposes a practical plan for the world to hit “pause” on the riskiest kinds of AI development until we have strong safety methods, while still letting people use today’s safer AI tools.

The main questions the paper asks

  • How can countries work together to stop anyone from creating ASI too soon?
  • How can we make sure everyone follows the rules even if they don’t fully trust each other?
  • How can we pause the dangerous parts of AI progress but keep the useful parts?

How their plan would work (explained simply)

Think of this like a global “speed limit” and “safety inspection” for AI.

  • Training vs. using: “Training” an AI is like teaching it for months with lots of computers so it can learn new abilities. “Using” (inference) is when you ask a trained AI to answer questions or help with tasks.
  • FLOPs: A FLOP is one tiny math step a computer does. Training powerful AIs takes mind-boggling numbers of FLOPs. The authors want limits based on FLOPs, like setting a maximum number of math steps you can use for training.

Here are the key parts of the plan, explained with everyday ideas:

  • A global coalition led by the U.S. and China sets AI “speed limits.”
    • Above a strict training limit (a very large number of FLOPs), training new frontier models is banned.
    • In a middle zone, training is allowed but only with approval and monitoring.
    • Below a small limit, training is freely allowed (so everyday and many research uses can continue).
  • Track the “engines” that power AI: Advanced AI chips are like powerful car engines—you need lots of them to train big models. The plan tracks where these chips are, how many exist, and how they’re used, so no one can secretly build a super-powerful AI.
  • Verify use: Inspections, smart chip features, and other checks make sure chips are used for safe purposes (like running existing models) and not for risky training.
  • Limit only the dangerous research: Some kinds of research make AIs stronger even without more chips (smarter training tricks, new algorithms). The plan temporarily restricts this specific frontier research so the “speed limit” can’t be dodged. It doesn’t ban normal, helpful AI work (like for healthcare tools or self-driving safety).
  • Nonproliferation and enforcement: Stop advanced chips from spreading to rule-breakers, use trade rules and sanctions if needed, and escalate only if a country tries to break the deal openly.
  • Start small, grow over time: The authors recommend a staged rollout—begin with transparency and trust-building, then expand to full monitoring once there’s political support.

Analogy: It’s like countries agreeing not to build certain kinds of rockets until we have strong safety systems, with inspectors checking rocket parts and factories, and with rules that still let people use regular airplanes.

What they found or concluded

This is a policy proposal, not an experiment, so there aren’t lab “results.” But the paper argues:

  • If we set FLOP-based limits, track advanced AI chips, and verify how they’re used, we can realistically prevent the creation of ASI with today’s technology.
  • To make the pause actually work, we must also limit the narrow slice of research that would rapidly push us toward ASI or break the monitoring system.
  • Politically, it’s hard right now—there isn’t enough will to sign such a deal today. But the authors believe support could grow as risks become clearer.
  • The plan has costs (slower AI progress, added oversight, risk of misuse of monitoring), but they argue the cost is worth it to avoid even a 10% chance of catastrophe.

Why this matters:

  • A global pause would reduce the chances of misaligned ASI taking control, AI-fueled wars, or misuse like advanced cyberattacks or engineered biothreats.
  • It buys time to solve the hard problem: making super-powerful AIs reliably care about human safety and values.

Why not simpler options?

The authors consider common objections and explain their answers in plain language:

  • Why not wait and pause later? We can’t reliably predict when AI will cross dangerous lines. If we wait, it might be too late—and chips might be too widespread to monitor.
  • Why limit research? Because clever new methods can make AIs much more capable without more hardware. If we only limit chips, algorithm breakthroughs could bypass the rules.
  • Won’t this help authoritarian control? They argue the current path (a few companies or states racing to ASI) risks an even worse concentration of power. Their plan tries to add safeguards and transparency, and it’s temporary—just until safety is proven.
  • Can’t we just build defenses? Some AI-enabled threats are offense-friendly and hard to defend against (think fast cyberattacks or bio-risks). Prevention is safer than betting everything on defense.
  • Why track existing chips? Because a hidden stock of powerful chips could enable a secret project. Knowing where the “engines” are is essential for trust and verification.

What happens after the pause?

The pause is not forever. The restrictions could be lifted when:

  • There are proven, widely trusted ways to align super-powerful AI with human values.
  • We have strong protections against misuse and proliferation.
  • There’s a stable, transparent way to develop very advanced AI safely (possibly a global, carefully monitored project).
  • Societies have plans for other big issues, like job changes, power concentration, and geopolitical stability.

What this could mean for the world

If adopted, this agreement could:

  • Greatly lower the chance of a disaster from misaligned or misused ASI.
  • Keep everyday helpful AI tools available while stopping the riskiest leaps.
  • Give time to build safety science, rules, and institutions strong enough to handle super-powerful AI.
  • Encourage cooperation between major powers (especially the U.S. and China) on a shared survival issue.

In short, the paper’s message is: Let’s set smart global guardrails now—like speed limits, inspections, and targeted research rules—so we don’t sprint into building something super-powerful and unsafe. Once we have reliable safety methods, we can move forward more confidently.

Knowledge Gaps

Below is a single, consolidated list of the paper’s unresolved knowledge gaps, limitations, and open questions. Each item is phrased to be concrete and actionable for future research or policy design.

  • Lack of a validated, dynamic method for setting and updating FLOP thresholds: no empirical mapping from training FLOPs to dangerous capability levels, no process for rapid adjustment in response to algorithmic efficiency gains or paradigm shifts.
  • Unclear technical feasibility and timeline for tamper-resistant on-chip monitoring: no concrete architecture, performance overhead analysis, or vendor commitments for secure attestation, logging, and inference-vs-training discrimination.
  • No plan for retrofitting the global stock of existing AI accelerators with trustworthy monitoring and attestation, including cost, logistics, and legal authorities needed to modify privately owned or sovereign hardware.
  • Insufficient detection strategy for covert facilities and decentralized compute: limited treatment of peer-to-peer clusters, botnets, federated learning, consumer GPUs/NPUs, and edge devices that can collectively exceed monitored thresholds.
  • Verification of declared vs undeclared training runs remains underspecified: how to reliably measure actual training FLOPs across mixed precision, sparsity, fine-tuning, RLHF, distillation, and inference-heavy “capability laundering” regimes.
  • Ambiguity in “H100-equivalent” thresholds and conversion: no standardized mapping across diverse accelerators (ASICs, FPGAs, TPUs, consumer GPUs, NPUs), memory bandwidth limits, and interconnects that influence attainable training scale.
  • No robust early-warning system for algorithmic breakthroughs that reduce compute by large factors (e.g., inference-scaling, new architectures), nor criteria for preemptive policy tightening when such breakthroughs are detected.
  • Insufficient treatment of alternative AI paradigms (neuromorphic, symbolic, evolutionary, analog, photonic, quantum) that might bypass compute-based controls; missing detection, evaluation, and containment strategies tailored to non-deep-learning approaches.
  • Agreement relies primarily on training controls but leaves inference controls underdeveloped: no mechanism to govern use, replication, or fine-tuning of already-trained dangerous models, nor model registry/watermarking protocols to track provenance and risk.
  • No defined capability evaluation framework for “advancement toward ASI”: missing standardized benchmarks, red-team methodologies, deception-resilience tests, and risk thresholds to decide when methods or models cross into restricted terrain.
  • Research restriction scope is undefined: no operational criteria to distinguish permitted vs prohibited research topics, governance of the whitelist, update cadence, appeals process, and due-process protections for researchers.
  • Verification of researcher activity lacks detail: no practical protocol for audits, interviews, code/repo inspections, or incentives/whistleblower protections, and unclear boundaries to minimize invasiveness and protect civil liberties.
  • Open-source model and code governance is unaddressed: how to prevent dangerous capability leakage via weights, libraries, training scripts, or datasets shared across borders and jurisdictions; takedown and “model quarantine” procedures absent.
  • Non-state actor risk is under-specified: insufficient policies for criminal groups or rogue labs operating across jurisdictions, including cyber operations, undercover monitoring, and international mutual legal assistance frameworks.
  • Enforcement triggers and escalation ladders are vague: no clear thresholds for sanctions, interdiction, or other coercive tools; no rules of engagement, proportionality principles, or crisis-management protocols to avoid geopolitical escalation.
  • Strategy for chip tracking and smuggling prevention is incomplete: no modeling of black-market dynamics, smuggling routes, customs capacity, satellite/power-signature detection limits, or false-positive/false-negative rates in supply-chain monitoring.
  • No quantitative analysis of breakout time: missing adversarial modeling of how fast a rogue actor could acquire compute and train covertly; lack of stochastic risk models to inform required inspection cadence and enforcement readiness.
  • Missing cost-benefit and macroeconomic impact assessment: no estimates of GDP, innovation, financial-market, and industry impacts; no compensation/transition policies for affected firms, workers, and countries, nor funding model for the monitoring regime.
  • Governance design of the Executive Council is minimal: no membership criteria, expansion pathways, voting rules, dispute resolution, transparency requirements, rotation, checks-and-balances, or independent oversight to prevent misuse.
  • Civil liberties and human rights safeguards are not specified: unclear limits on surveillance, search and seizure, data retention, due process, and redress; no external review mechanisms to prevent authoritarian abuse.
  • Compatibility with existing legal frameworks is unresolved: how domestic legislation (export controls, privacy, antitrust, IP) and international instruments (EU AI Act, national RSPs, Wassenaar-like regimes) will be harmonized.
  • Data governance gaps: no controls on dataset creation, curation, and synthetic data pipelines that may enable dangerous capabilities; no auditing standards for data provenance and risk.
  • Cloud, HPC, and multi-tenant compliance architecture is unspecified: how providers enforce per-tenant monitoring while preserving privacy, how sovereign clouds and academic HPC centers integrate with attestation and inspection.
  • Criteria for lifting the agreement are vague: no measurable milestones for “years-long track record,” “established consensus,” or “strong misuse and proliferation controls”; no testbeds, trials, or formal verification to demonstrate alignment efficacy.
  • No plan to detect and respond to AGI capable of automating AI R&D: missing triggers, containment protocols, and “kill-switch” mechanisms when automation of capabilities research begins to accelerate beyond governance control.
  • Underspecified approach to “narrow AI” carve-outs: no deterministic criteria to classify and periodically re-validate safe application-specific systems, nor safeguards against capability creep through iterative updates.
  • Energy and power-signature monitoring lacks rigor: no validated thresholds or models linking energy consumption to training scale, nor strategies to evade masking (e.g., colocation with other workloads).
  • Attribution and provenance of training efforts are weak: no standardized logging requirements, cryptographic commitments, or third-party audit trails that would make covert training detectable or legally actionable.
  • International participation and coalition-building strategy is high level: no engagement plans for blocs (EU, India, ASEAN), contingent strategies if a major power defects, or mechanisms to share benefits (security guarantees, infrastructure access) equitably.
  • Vendor and supply-chain commitments are unspecified: no binding requirements for chipmakers to implement secure attestation, firmware update controls, serial tracking, and anti-tamper features, nor liability/regulatory levers to enforce compliance.
  • Threshold gaming risk remains high: no guardrails against splitting large training into many sub-threshold runs, iterative fine-tuning/distillation that cumulatively yields dangerous capabilities, or cross-entity collusion to evade monitoring.
  • Missing performance, reliability, and security evaluation of attestation systems: no plan for penetration testing, threat modeling (insider, nation-state adversaries), resilience to firmware exploits, or rollback protection.
  • Funding and capacity building is unaddressed: no global budget estimates, financing mechanisms, and training pipelines for inspectors, OSINT analysts, supply-chain auditors, and technical teams, especially in lower-capacity states.
  • Ambiguity around “monitored facilities” threshold (16 H100s): no rationale for chosen value, no sensitivity analysis, and no adaptation plan as hardware evolves (e.g., far more capable next-gen accelerators).
  • No structured interaction model with corporate RSPs/preparedness frameworks: how company-level thresholds, evals, and safety commitments integrate with and inform international verification and the whitelist.
  • Lack of empirical case studies or pilots: no proposed sandbox jurisdictions or limited-scope trials to test chip tracking, FLOP attestation, research verification, and enforcement mechanics before global rollout.
  • Exit and failure modes are not planned: no contingency strategy if verification becomes untenable (e.g., widespread consumer capabilities), if a major state defects, or if breakthrough paradigms nullify compute-based governance.

Glossary

  • AGI: Artificial general intelligence; systems performing at human-level across all cognitive domains. "We note that artificial general intelligence (AGI)–systems that perform at human-level across all cognitive domains–would likely emerge before ASI."
  • Agentic: Exhibiting goal-directed behavior and the ability to pursue objectives over time. "e.g., in the setting of agentic software engineering"
  • Arms-control: International efforts and agreements to limit or regulate weapons and related technologies. "violations of arms-control and nonproliferation agreements."
  • Artificial superintelligence (ASI): AI substantially smarter than humans at all cognitive tasks. "premature development of artificial superintelligence (ASI) poses catastrophic risks"
  • Challenge inspections: Surprise or on-demand inspections to verify compliance with an agreement. "power consumption monitoring, challenge inspections, whistleblowers, and more."
  • Collective action problem: A situation where individual incentives lead to outcomes that are suboptimal for the group. "Those racing toward superintelligence, to the extent they are concerned with catastrophic risks, are stuck in a collective action problem"
  • Counterproliferation: Active measures to stop the spread or acquisition of dangerous technologies or capabilities. "strong counterproliferation efforts to ensure these export controls are upheld."
  • Executive Council: A governing body of the agreement with decision-making authority. "They are the initial members of an Executive Council which governs the agreement."
  • Export controls: Government restrictions on the transfer of sensitive technologies. "export controls on AI chips and chip manufacturing equipment"
  • FLOP thresholds: Governance limits defined by the number of floating-point operations permitted in training. "Limits on the scale of AI training are operationalized by FLOP thresholds"
  • FP8: An 8-bit floating-point format used to improve performance and efficiency in AI workloads. "assuming 50% utilization in FP8."
  • Frontier models: The most advanced, cutting-edge AI models at the leading edge of capability. "Today’s frontier models are not robust to jailbreaks:"
  • H100-equivalents: A hardware capacity measure referencing the performance of NVIDIA H100 GPUs. "greater than 16 H100-equivalents; 16 H100s cost approximately \$500,000 USD in 2025"
  • Inference-scaling regime: A development approach where capabilities grow primarily through inference-time techniques rather than larger training runs. "the inference-scaling regime started with OpenAI’s o-series of models"
  • Jailbreaks: Inputs crafted to bypass a model’s safety or policy constraints. "Today’s frontier models are not robust to jailbreaks:"
  • Monitored Threshold: A compute boundary above which training must be approved and monitored. "Training runs below this threshold but above the Monitored Threshold (i.e., 102210^{22} FLOP) must be approved and monitored by coalition authorities."
  • Nonproliferation: Policies and actions aimed at preventing the spread of dangerous technologies. "For these, the coalition relies on nonproliferation and enforcement."
  • Open-source intelligence: Information gathered from publicly available sources for intelligence analysis. "open-source intelligence"
  • Reasoning models: AI models designed or tuned to perform multi-step, structured reasoning tasks. "Current models, especially reasoning models, display a propensity to reward hack and lie to their users"
  • Remote attestation: A protocol where hardware/software proves its state to a remote verifier. "Remote attestation and other technical solutions are in early stages"
  • Strict Threshold: A compute limit beyond which training is prohibited. "AI training runs above the Strict Threshold (i.e., 102410^{24} FLOP) are prohibited."
  • Strategic deterrence: The use of credible threats to discourage adversaries from taking aggressive actions. "Capabilities that undermine strategic deterrence and incentivize first strikes may trigger wars and conflicts"
  • Tamper-resistant on-chip mechanisms: Hardware features designed to resist manipulation and enable reliable monitoring or control. "The coalition works to develop tamper-resistant on-chip mechanisms for such purposes"
  • Training compute: The total computational resources used to train a model. "The training compute used to train various notable AI models in the last few years"
  • Training FLOP: The total number of floating-point operations consumed during training. "operationalized as total training FLOP"
  • Transformer architecture: A neural network design based on self-attention that enabled major efficiency gains. "Ho et al. 2024 estimate that the transformer architecture provided 7.2× reduction in operations"
  • Whitelist: A list of approved methods or items permitted under the agreement. "approved methods (e.g., from a Whitelist)"

Practical Applications

Below are practical, real-world applications that flow from the paper’s proposed international agreement, verification architecture, and staged implementation. Each item includes sectors, potential tools/products/workflows, and assumptions or dependencies affecting feasibility.

Immediate Applications

These items can be piloted or deployed now with existing institutions, technologies, and commercial practices, even without a full international treaty.

Industry

  • FLOP accounting and pre-training gates in ML pipelines (software)
    • Tools/Workflows: Integrate training-compute estimators and “Monitored Threshold” (~10^22 FLOP) pre-registration gates into MLOps (e.g., CI/CD hooks, SDKs for PyTorch/JAX), with audit logs and automatic capability disclosure in model cards.
    • Assumptions/Dependencies: Agreement on standard FLOP estimation methods; willingness of firms to adopt; compatibility with diverse model architectures.
  • Chip inventory and “compute passport” telemetry for data centers (cloud, semiconductors)
    • Tools/Workflows: Asset registries linking serials and purchase orders; secure agent software reporting cluster composition, utilization, and physical location; periodic attestation checks aligned with SOC 2/ISO 27001.
    • Assumptions/Dependencies: Cooperation from cloud providers and OEMs; supply-chain transparency; acceptance of light-touch audits.
  • Inference-only cloud tiers with attested separation from training (cloud, software)
    • Tools/Workflows: Offer “inference-only” SKUs with mode-locking, telemetry, and SLAs; customer-facing attestations and logs proving no training workloads; automatic alarms for gradient accumulation or optimizer usage.
    • Assumptions/Dependencies: Reliable training-vs-inference fingerprinting; customer demand for compliant services; regulator recognition of attestations.
  • Power and utilization anomaly detection for GPU clusters (energy, cybersecurity)
    • Tools/Workflows: Pair high-resolution metering with workload telemetry to flag training-like signatures (e.g., sustained high utilization patterns); integrate into SOC dashboards.
    • Assumptions/Dependencies: Access to facility-level power data; models of training workload profiles; privacy-preserving aggregation.
  • Prototype training/inference mode detection on accelerators (semiconductors, software)
    • Tools/Workflows: Firmware modules and driver heuristics that flag optimizer usage, backprop patterns, or distributed training configs; exposure via APIs and logs for auditors.
    • Assumptions/Dependencies: Cooperation from GPU vendors; tamper resistance sufficient for audits; manageable false positives/negatives.
  • Small-cluster caps in unmonitored sites (industry policy)
    • Tools/Workflows: Company policies limiting unmonitored clusters to “< 16 H100-equivalents” with self-attestation; procurement workflows that route larger builds into monitored facilities.
    • Assumptions/Dependencies: External recognition of caps as risk-reducing; standardized equivalence metrics across chip generations.
  • Compute risk officer and compliance playbooks (finance, corporate governance)
    • Tools/Workflows: New roles overseeing compute disclosures, threshold gating, inspector readiness; board-level reporting; incident response for suspected violations.
    • Assumptions/Dependencies: Executive buy-in; insurance market recognition; alignment with legal counsel.

Academia

  • Research ethics screening and restricted-topic firewalls (education, research)
    • Tools/Workflows: Departmental IRB-style review for ML projects; “Whitelist” of permitted algorithms/methods; red-team checks to prevent covert capability escalation.
    • Assumptions/Dependencies: Community norms and journal policies; clarity about what counts as “dangerous” capability research.
  • Compute disclosure standards in publications (education, software)
    • Tools/Workflows: Require training-compute estimates, chip counts, and thresholds in model cards and papers; automated reproducibility checklists embedding FLOP calculations.
    • Assumptions/Dependencies: Journal and conference adoption; agreed methodology for multi-stage training and fine-tuning estimates.
  • Methods research for verification (software, semiconductors)
    • Tools/Workflows: Benchmarks and open-source libraries for FLOP estimation; studies on training-vs-inference fingerprints; privacy-preserving attestation protocols.
    • Assumptions/Dependencies: Access to realistic workloads; cooperation from labs for ground-truth data.

Policy and Government

  • National pilot registries for medium-scale training (policy)
    • Tools/Workflows: Voluntary/mandatory notifications for runs above ~10^22 FLOP; lightweight reviewer panels; early transparency measures across labs.
    • Assumptions/Dependencies: Legal authority; industry participation; simple workflows to avoid stalling benign research.
  • Export-control tightening and end-use verification (policy, semiconductors)
    • Tools/Workflows: Expand end-user certifications for AI accelerators; integrate “compute passports” into customs and re-export rules; post-shipment checks.
    • Assumptions/Dependencies: Coordination with allies; limited black/grey market leakage.
  • Whistleblower enablement for AI capability violations (policy, legal)
    • Tools/Workflows: Secure reporting channels; enhanced protections; rewards for verified disclosures related to restricted research or unregistered clusters.
    • Assumptions/Dependencies: Statutory changes; funded investigative capacity; trust in confidentiality.
  • Challenge inspection MOUs and inspectorate formation (policy, international)
    • Tools/Workflows: Bilateral/multilateral agreements granting facility access with notice; trained technical inspectors; standard inspection playbooks.
    • Assumptions/Dependencies: Diplomatic will; manageable scope; clarity on safeguarding trade secrets.
  • Dynamic threshold advisory groups (policy, academia)
    • Tools/Workflows: National expert panels to periodically review ~10^22/10^24 FLOP thresholds; public guidance; alignment with international bodies.
    • Assumptions/Dependencies: Diverse expertise; access to frontier capability data; transparent mandate.

Daily Life and Professional Practice

  • Developer-side compute calculators and compliance checkers (software)
    • Tools/Workflows: IDE plugins/CLI tools estimating FLOPs and flagging potential threshold crossings; warnings for use of restricted methods.
    • Assumptions/Dependencies: Easy integration; accurate heuristics for varied architectures.
  • Organizational training on safe AI use and jailbreak avoidance (education)
    • Tools/Workflows: Short courses for product teams and IT; policies limiting risky prompting; capability-aware access controls for powerful models.
    • Assumptions/Dependencies: Up-to-date threat models; buy-in from team leads.

Long-Term Applications

These items likely require further research, scaling, international coordination, hardware redesign, or maturation of institutions.

Industry

  • Standardized, tamper-resistant on-chip verification (semiconductors, cloud)
    • Tools/Products: New accelerator generations with secure enclaves; hardware mode locks; signed telemetry; remote attestation compatible with monitored facilities.
    • Assumptions/Dependencies: Vendor adoption; resistance to firmware bypass; global standards harmonization.
  • Compliance-first AI cloud and centralized monitored compute (cloud, software)
    • Tools/Products: Dedicated “monitored compute providers” licensed to run medium-scale training globally; embedded inspectors; automated reporting to regulators.
    • Assumptions/Dependencies: International licensing; predictable audit costs; customer trust in confidentiality.
  • Insurance and certification markets for AI compliance (finance)
    • Tools/Products: Policies pricing breakout risk; certifications akin to ISO/NIST for compute governance; premiums linked to telemetry strength and inspection history.
    • Assumptions/Dependencies: Actuarial data; regulator recognition; coverage clarity in sanctions contexts.

Academia

  • Mature alignment science with proven benchmarks and track record (education, research)
    • Tools/Products: Longitudinal benchmarks for alignment robustness; standardized eval suites; replication centers demonstrating sustained success.
    • Assumptions/Dependencies: Funding continuity; accepted scientific standards for “alignment sufficiency.”
  • Global model lineage registry and research licensing (education, policy)
    • Tools/Products: Persistent identifiers for models, datasets, and training runs; licensing for potentially dangerous lines of research; public provenance trails.
    • Assumptions/Dependencies: Research community buy-in; national adoption of licensing frameworks.

Policy and Government

  • Binding international treaty with a US–PRC–led Executive Council (policy, international)
    • Tools/Workflows: Formal treaty text; governance charter; membership expansion; modification protocols minimizing veto risks.
    • Assumptions/Dependencies: Political will; mutual assurance mechanisms; stable geopolitical climate.
  • Global chip-location and monitoring network (policy, semiconductors)
    • Tools/Workflows: Integrated supply-chain tracking, customs data fusion, utility metering partnerships, OSINT programs, and challenge inspections to locate and monitor accelerators worldwide.
    • Assumptions/Dependencies: Continued chip-supply concentration; cooperation from utilities and cloud operators; legal authorities for inspections.
  • Research restrictions and whitelists codified in law (policy, education)
    • Tools/Workflows: Statutory definitions of restricted capability research; periodic whitelist updates; safe-use carve-outs (e.g., self-driving, industrial controls).
    • Assumptions/Dependencies: Narrow scoping to minimize collateral damages; appeal processes; international harmonization.
  • Nonproliferation apparatus for AI chips and manufacturing equipment (policy)
    • Tools/Workflows: Export regimes, watchlists, sanctions, and counterproliferation operations; joint task forces to disrupt smuggling networks.
    • Assumptions/Dependencies: Multilateral alignment; intelligence-sharing; effectiveness against covert state-backed efforts.
  • Graduated enforcement mechanisms up to disruption of rogue projects (policy, defense)
    • Tools/Workflows: Diplomatic pressure, economic sanctions, cyber operations, facility shutdowns where lawful; rules of engagement comparable to arms-control contexts.
    • Assumptions/Dependencies: Legal authorities; proportionality and oversight; accurate attribution.
  • Dynamic threshold governance and periodic capability reassessments (policy, academia)
    • Tools/Workflows: Scheduled reviews to update FLOP thresholds and monitoring rules; triggers for tightening if algorithmic efficiency improves faster than expected.
    • Assumptions/Dependencies: Timely data; consensus methods for uncertainty handling; resilience against politicization.

Sectoral Deployment of Safe Narrow AI (examples)

  • Healthcare: Compliance-attested diagnostic support operating in inference-only modes; auditable model updates with compute disclosures.
    • Assumptions/Dependencies: Medical device regulation integration; hospital IT telemetry.
  • Transportation and robotics: Permitted autonomous systems with hardware attestation and capability caps; monitored updates via licensed providers.
    • Assumptions/Dependencies: Safety case frameworks; regulator capacity for audits.
  • Energy and utilities: Grid-integrated monitoring for high-density compute clusters; utility-level anomaly detection aiding treaty verification.
    • Assumptions/Dependencies: Data-sharing agreements; privacy safeguards; cybersecurity hardening.
  • Finance: Model governance with compute risk disclosures and certification; alignment assurance requirements for critical decision systems.
    • Assumptions/Dependencies: Prudential regulator mandates; audit standards.

Societal and Professional Infrastructure

  • International whistleblower protection standards specific to AI (policy, legal)
    • Tools/Products: Cross-border safe reporting channels; standardized evidentiary rules; protection from retaliation.
    • Assumptions/Dependencies: Treaty codification; funding; interoperable legal frameworks.
  • Privacy-preserving monitoring technologies (software, policy)
    • Tools/Products: Secure aggregation, differential privacy, and zero-knowledge proofs enabling verification without revealing sensitive workloads.
    • Assumptions/Dependencies: Advancements in applied cryptography; vendor integration.
  • Emergency response protocols for suspected treaty violations (policy)
    • Tools/Workflows: Joint investigative teams; rapid inspection deployment; escalation ladders and deconfliction guides.
    • Assumptions/Dependencies: Real-time intelligence-sharing; pre-negotiated access rights; clear thresholds for action.
  • Global AI development project post-mitigation (policy, academia, industry)
    • Tools/Workflows: Monitored, cautious development under an internationally supervised program once alignment and misuse controls meet agreed safety bars.
    • Assumptions/Dependencies: Established alignment consensus; reliable verifiability; broad political legitimacy.

Notes on cross-cutting assumptions and dependencies:

  • FLOP thresholds must remain predictive of dangerous capability; rapid algorithmic efficiency gains could undermine them.
  • Ability to locate and monitor existing chips is pivotal; effectiveness declines if accelerators become ubiquitous in consumer devices.
  • Hardware attestation must be tamper-resistant; otherwise physical inspections remain necessary.
  • Success depends on political will, international trust-building, and enforcement credibility; staged implementation reduces barriers.
  • Research restrictions must be narrowly scoped to minimize collateral damage in benign AI subfields; regular reviews are needed to adapt to paradigm shifts.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 12 tweets with 128 likes about this paper.