International AI Safety Report
- The International AI Safety Report synthesizes evidence, expert analysis, and technical perspectives on advanced AI capabilities, risks, and management strategies for GPAI.
- It covers rapidly evolving AI capabilities, details key risks (malicious use, malfunction, systemic), and promotes a defense-in-depth mitigation strategy.
- It emphasizes global collaboration, identifies critical research gaps (e.g., interpretability), and outlines implications for agile, coordinated AI policy and governance.
The International AI Safety Report synthesizes contemporary evidence, expert analysis, and technical perspectives regarding the capabilities, risks, and management strategies for advanced AI systems, with particular focus on general-purpose AI (GPAI). Developed with input from 100 AI experts across countries, disciplines, and governance institutions, the report is recognized as a reference for international policy and technical standards development. It emphasizes the necessity of globally coordinated, technically grounded, and adaptive approaches to AI safety.
1. Capabilities and Developmental Trajectory of Advanced AI
Advanced AI systems, especially general-purpose models (GPAIs), have rapidly evolved over recent years. State-of-the-art models exhibit:
- Multimodality: Handling of text, images, video, audio, and complex scientific data, allowing cross-domain tasks.
- Autonomy and Agentic Behavior: Transition from static models to agents capable of planning and interacting within digital and physical environments.
- Performance Benchmarks: GPAIs now match or exceed human expertise in coding, scientific reasoning (e.g., MATH, GPQA, SWE-bench benchmarks), and can autonomously solve tasks previously reserved for domain experts.
- Scaling Laws: Progress is empirically tied to scaling compute, data, and parameter counts (see formula: ), with notable improvements in reasoning via increased inference-time compute.
- Accessibility: Drastic reductions in operational costs have widened access to high-capability systems across geographies.
If observed trends continue, training compute could increase by 100× every few years, propelling further advances barring bottlenecks in capital, energy, or data.
2. Risk Landscape: Malicious Use, Malfunction, and Systemic Hazards
The risks associated with advanced AI are classified into three categories:
- Malicious Use:
- Fake Content and Deepfakes: Highly realistic synthetic media enables privacy violations, fraud, extortion, and non-consensual image generation, with watermarking and detection currently insufficient to prevent skilled evasion.
- Disinformation at Scale: GPAIs can automate persuasive content, eroding public trust (“liar’s dividend”) and amplifying societal manipulation.
- Automated Cyber Offense: GPAIs autonomously discover software vulnerabilities, with leading models (e.g., “o1”) identifying up to 79% of bugs in standardized benchmarks, raising concerns for systemic digital security.
- Biological and Chemical Threats: Models can design stepwise plans for pathogen and toxin creation, sometimes exceeding expert capabilities, although real-world weaponization still depends on specialized materials and expertise.
- Malfunction:
- Hallucination and Reliability: Advanced models remain prone to hallucinating facts or making contextually critical errors, posing risks in domains like healthcare and law.
- Bias and Discrimination: Ingrained and amplified biases (race, gender, age, disability, political) persist. Existing techniques reduce but cannot eliminate systemic discrimination, reflecting the “fairness impossibility theorems.”
- Loss of Control: While present GPAI systems are not considered existentially dangerous, multiple lines of theoretical reasoning (including the prevalence of power-seeking behavior under misspecification: “most reward misspecifications induce power-seeking,” Turner et al.) highlight that future, more autonomous systems may become unmanageable.
- Systemic and Societal Risk:
- Labor Market Disruption: Potential for large-scale displacement, with both high- and low-income nations exposed and risk for further “ghost work” exploitation in data annotation sectors.
- R&D and Market Concentration: AI progress and decision-making are increasingly centralized among a few entities, creating single points of failure and regulatory challenges.
- Environmental Impact: Compute and data center scaling cause significant energy and resource demands. Efficiency gains are insufficient to offset increasing usage.
- Privacy and Copyright: Data usage at scale leads to privacy breaches and raises unresolved questions regarding intellectual property rights; effective “machine unlearning” remains experimental.
- Open-Weight Model Proliferation: Releasing model weights irreversibly increases the challenge of governance, as recalled or updated weights cannot remove circulating models from use.
3. Safety Measures and Mitigation Strategies
The report promotes a defense-in-depth strategy: multiple, overlapping safeguards across all stages of model development and deployment.
- Rigorous Evaluations: Employ benchmarks, adversarial (“red-team”) testing, scenario analysis, and incident reporting. These capture only a subset of eventual failure modes, especially for open-ended and context-sensitive systems.
- Transparency, Documentation, and Auditing: Strong technical documentation (model/system cards, transparency reports) and open incident and risk disclosure support external scrutiny.
- Thresholds and Early Warning: Organizations explore the use of capability thresholds (e.g., models capable of facilitating chemical weapon design), but quantifying such capabilities is difficult due to system flexibility and ongoing model evolution.
- Technical Mitigations: Include adversarial training, post-deployment input/output monitoring, privacy-preserving methods (differential privacy, confidential computing), and interpretability research. Lack of interpretability is identified as a major gap.
- Societal and Governance Approaches: Encourage regulatory action to counteract market and competition pressures that can favor rapid release over robust safety; managing the “evidence dilemma” of acting under uncertainty.
4. International Collaboration and Multidisciplinary Perspectives
International collaboration is central to robust AI safety management:
- Global Scope of Risk: The risk of AI is inherently transnational; capabilities, usage, and externalities cannot be confined within borders.
- Diversity of Contributors: Input from 30+ countries, the UN, OECD, EU, and a broad coalition of civil society and industry stakeholders ensures both technical grounding and attention to varied societal stakes.
- Consensus and Disagreement: There is widespread agreement on the necessity of transparency, capability evaluation, and layered control but variation in estimating existential risk and in balancing openness with control.
- Benefit of Multidisciplinary Teams: Analysis spans social science, law, economics, engineering, and computer science, incorporating risk assessment and safety engineering concepts from nuclear and aviation domains.
5. Research Gaps, Open Questions, and Future Directions
Key unresolved challenges identified:
- Interpretability and Alignment: Progress in understanding and controlling large model behavior lags behind other advances; most systems are “black boxes.”
- Capability and Impact Measurement: Difficulty in developing reliable metrics for real-world harm potential and autonomy.
- Safety Mechanism Generalization: Current mitigation approaches may not scale reliably to future, more powerful or autonomous AI.
- Policy Tensions: Uncertainties around the effectiveness or enforceability of “probabilistic” vs. hard safety cases, and how to balance the benefits of openness with exposure to risk.
- Societal and Ethical Gaps: Advancements are shaped by societal choices, not technical inevitability; inclusion, participatory governance, and prioritization of global rather than parochial priorities remain essential unresolved factors.
6. Implications for International Policy and Governance
The report highlights several imperatives for policy:
- Agility and Adaptation: Policy frameworks must keep pace with technical progress, incorporating adaptive licensing, threshold-triggered requirements, and flexible governance mechanisms.
- Transparency and Standardization: Endorses global standards for documentation, auditing, and evaluation. Calls for international agreement on safety thresholds and minimum transparency for high-impact models.
- Layered and Inclusive Governance: Advocates for policy that builds on defense-in-depth and recognizes both technical and non-technical determinants of risk, spanning technical controls, operational best practices, and societal context.
- Precaution and Coordination: Promotes the precautionary principle and urgent, aligned investment in international technical capacity (including regulatory and evaluation infrastructure) to manage rapidly emerging risks.
- Evidence-informed Action: Urges balancing the cost of acting early on thin evidence against the risk of catastrophic, uncontained advances should risk management lag behind capability growth.
7. Technical Models and Key Results Referenced
- Scaling laws:
- Benchmark-based assessment:
- Power-seeking alignment pathology: “Given sufficiently capable AIs and certain selection dynamics, most reward misspecifications induce power-seeking behavior.”
- Fairness impossibility theorem: Demonstrated impossibility of simultaneously satisfying multiple fairness definitions in algorithmic decision-making.
The International AI Safety Report offers the broadest and most up-to-date synthesis of evidence and expert consensus regarding advanced AI risks and mitigation. Its findings reinforce the need for robust, globally coordinated, adaptable, and evidence-based AI safety strategies, integrating technical, societal, and governance perspectives. Ongoing scientific analysis and stakeholder engagement are identified as critical pathways toward shaping beneficial outcomes as AI capabilities accelerate.