AI‑45° Law: Balancing Capability & Safety

Updated 29 July 2025

AI‑45° Law is a principle in AGI that mandates parallel improvements in intelligence and safety, visualized as a 45° line in a capability-safety space.
It operationalizes AGI training through layered frameworks such as Approximate Alignment, Intervenable, and Reflectable, ensuring traceability and control.
It also influences systemic governance by integrating multi-stakeholder oversight and ethical policies to promote responsible and scalable AI development.

The AI‑45° Law is a guiding principle in artificial general intelligence (AGI) research, formalizing the mandate that advancements in AI capability and safety must progress in parallel. Represented visually as a $45^\circ$ trajectory in a two-dimensional capability–safety coordinate system, this law asserts that responsible AGI development requires neither safety nor intelligence to be sacrificed in pursuit of the other. It provides a conceptual roadmap supported by systems frameworks and operationalized in contemporary model training, impacting both technical alignment protocols and governance strategies.

1. Principle of the AI‑45° Law

The AI‑45° Law stipulates that as AI systems’ capabilities increase, there must be a commensurate enhancement in safety mechanisms ( $S$ ), such that the ideal development trajectory follows the $C=S$ diagonal (with $C$ as capability and $S$ as safety). Two critical thresholds are delineated:

Red Line: The boundary, once crossed, beyond which uncontrolled or unsafe developments—such as autonomous replication, instrumental power-seeking, or weaponization—pose catastrophic or existential risks to society.
Yellow Line: Early warning indicators that signal proximity to dangerous capability regimes, prompting the imposition of significantly more rigorous safety interventions.

This balanced development, fundamental to trustworthy AGI, departs from traditional approaches that often accept a trade-off between performance and precaution. Instead, it frames safety and intelligence as co-evolving, non-competing objectives (Yang et al., 8 Dec 2024).

2. Causal Ladder of Trustworthy AGI

Drawing on Judea Pearl's "Ladder of Causation", the AI‑45° Law is instantiated through the Causal Ladder of Trustworthy AGI, a hierarchical framework comprising three principal layers:

Layer	Core Mechanisms	Objective
Approximate Alignment	Supervised fine-tuning, unlearning	Aligns outputs with human values (correlation/association)
Intervenable	RL (human/AI feedback), mechanistic interpretability	Enables transparent, in-situ human intervention (intervention)
Reflectable	Self-reflection, counterfactual reasoning	Facilitates counterfactual self-correction and world modeling

Approximate Alignment Layer: Anchored in data-driven, correlation-based practices, including supervised fine-tuning and machine unlearning, this foundational layer ensures that AI systems' outputs accord with prevailing human ethics and social norms.
Intervenable Layer: Emphasizes the capacity for external parties to inspect, understand, and, if necessary, intervene during inference—enabling procedural transparency and control through reinforcement learning and mechanistic interpretability.
Reflectable Layer: Grants the AI robust self-reflective and counterfactual reasoning faculties, allowing the system to adapt its future actions by evaluating "what-if" scenarios, thereby preventing error propagation and reinforcing longitudinal reliability.

The ladder's dimensioning further distinguishes between endogenous trustworthiness (intrinsic safety architecture) and exogenous trustworthiness (external validation and oversight), reflecting a comprehensive approach to trustworthy AGI (Yang et al., 8 Dec 2024).

3. Operationalization in Model Training

The AI‑45° Law finds concrete application in systems such as SafeWork‑R1 (Lab et al., 24 Jul 2025), where capability and safety are co-evolved during training using the SafeLadder framework:

Chain-of-Thought Supervised Fine-Tuning (CoT SFT): Trains stepwise reasoning patterns, establishing traceable logic chains.
Multimodal, Multitask, Multiobjective Reinforcement Learning (M³-RL): Incorporates composite rewards for safety, helpfulness, task adherence, and output format, maximizing both capability and precaution.
Safe-and-Efficient RL: Utilizes efficiency metrics, including Conditional Advantage for Length-based Estimation (CALE), rewarding succinct and safe responses.
Deliberative Search RL: Embeds iterative, external information retrieval mechanisms during inference, guided by a confidence metric and real-time verification.

Mathematically, the Clipped Policy Gradient Optimization with Policy Drift (CPGD) objective governs stable policy refinement:

$\mathcal{L}_{\text{CPGD}}(\theta; \theta_{\text{old}}) = \mathbb{E}_{\mathbf{x} \in \mathcal{D}} \Biggl[ \mathbb{E}_{\mathbf{y} \sim \pi_{\theta_{\text{old}}}} \Bigl[ \min\left\{ \ln\frac{\pi_{\theta}(\mathbf{y}|\mathbf{x})}{\pi_{\theta_{\text{old}}}(\mathbf{y}|\mathbf{x})}A(\mathbf{x},\mathbf{y}), \text{clip}_{\ln(1-\epsilon)}^{\ln(1+\epsilon)}\left( \ln\frac{\pi_{\theta}(\mathbf{y}|\mathbf{x})}{\pi_{\theta_{\text{old}}}(\mathbf{y}|\mathbf{x})} \right)A(\mathbf{x},\mathbf{y}) \right\} \Bigr] - \alpha\cdot D_{\text{KL}}(\pi_{\theta_{\text{old}}} \| \pi_{\theta}) \Biggr]$

with $A(\mathbf{x}, \mathbf{y})$ as the advantage function, and a KL penalty enforcing conservative updates to avoid sudden unsafe policy drift.

A constrained RL formulation further optimizes responses under explicit safety and confidence constraints, dynamically updated through Lagrange multipliers in a dual optimization routine. Principled Value Models (PVMs) are then used at inference to score and select candidate tokens, ensuring safety via context-sensitive routing vectors.

4. Trustworthy AGI: Five Levels

The framework defines five progressive levels of AGI trustworthiness, forming a taxonomy for both evaluation and design:

Level	Focus	Key Mechanisms
Perception Trustworthiness	Input reliability and bias mitigation	Data preprocessing, sensor validation
Reasoning Trustworthiness	Transparent, causal reasoning	Chain-of-thought, logical traceability
Decision-making Trustworthiness	Ethically justified, context-aware decisions	Ethical constraints, intervention hooks
Autonomy Trustworthiness	Dynamic self-regulation and self-correction	Reflective loops, runtime adaptation
Collaboration Trustworthiness	Reliable interaction and consensus with other agents	Protocol enforcement, multi-agent governance

This staged approach integrates progressively more sophisticated safeguards, advancing from robust, unbiased input handling to complex, transparent multi-agent negotiation and protocol compliance (Yang et al., 8 Dec 2024).

5. Empirical Performance and Model Generalizability

Empirical study demonstrates the practical impact of the AI‑45° Law when instantiated in SafeWork‑R1 (Lab et al., 24 Jul 2025):

Safety Performance: SafeWork‑R1 achieves a 46.54% mean improvement over its Qwen2.5‑VL‑72B base on established safety-related benchmarks, while maintaining general reasoning capabilities.
Comparative Benchmarks: It exceeds the safety scores of GPT-4.1 and Claude Opus 4 on metrics such as MM‑SafetyBench and SIUO, yielding higher safe response rates.
Framework Generality: The SafeLadder protocol has successfully scaled to diverse architectures and sizes, including InternVL3‑78B and DeepSeek‑R1‑Distill‑Llama‑70B. This suggests the modular stagewise alignment paradigm can be broadly adopted within and beyond the LLM segment.

6. Systemic Governance and Oversight

Technological strategies are complemented by systemic governance proposals:

Lifecycle Management: Oversight protocols spanning the entire AI development and deployment cycle.
Multi-Stakeholder Involvement: Co-governance frameworks actively engaging governmental bodies, industry, academia, and civil society.
Ethical AI Governance: Formal policies (termed “Governance for Good”) that go beyond harm avoidance, orienting innovations toward societal benefit.
Global Public Good Perspective: Acknowledgment of AI safety as a non-rivalrous, non-excludable global public good, requiring supranational collaboration.

These measures address the limitations of purely technical safeguards and seek to institutionalize responsibility, transparency, and equity in AGI’s global trajectory (Yang et al., 8 Dec 2024).

7. Formal Representation and Conceptual Visualization

While the theoretical underpinnings are largely conceptual, the law is formalized using standard mathematical notation and LaTeX presentations:

The ideal co-evolution trajectory is symbolically the $45^\circ$ line ( $C=S$ ) in a capability–safety plane.
Optimization constructs use $\argmax$ , $\argmin$ , and $\Tr$ notation in the policy learning pipeline.
Illustrative figures represent both the Causal Ladder and the matrix of AGI trustworthiness levels to clarify the architecture and governance structure.

This symbolic formalism consolidates the AI‑45° Law’s role as both a theoretical and practical foundation for successive AGI research, systems building, and policy development.

PDF Markdown Chat (Pro)

References (2)

Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI (2024)

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law (2025)

AI‑45° Law: Balancing Capability & Safety

1. Principle of the AI‑45° Law

2. Causal Ladder of Trustworthy AGI

3. Operationalization in Model Training

4. Trustworthy AGI: Five Levels

5. Empirical Performance and Model Generalizability

6. Systemic Governance and Oversight

7. Formal Representation and Conceptual Visualization

Whiteboard

Follow Topic

Continue Learning

AI‑45° Law: Balancing Capability & Safety

1. Principle of the AI‑45° Law

2. Causal Ladder of Trustworthy AGI

3. Operationalization in Model Training

4. Trustworthy AGI: Five Levels

5. Empirical Performance and Model Generalizability

6. Systemic Governance and Oversight

7. Formal Representation and Conceptual Visualization

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics