Provably safe systems: the only path to controllable AGI (2309.01933v1)

Published 5 Sep 2023 in cs.CY, cs.AI, and cs.LG

Abstract: We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.

PDF Abstract

Provably Safe Systems: Ensuring Safe Controlled AGI

The paper "Provably Safe Systems: The Only Path to Controllable AGI," presents a comprehensive framework for achieving safety in the deployment and operation of Artificial General Intelligences (AGIs) through the implementation of provably compliant systems. Authored by Max Tegmark and Steve Omohundro, the paper outlines the imperative of creating AGI systems that are mathematically assured to operate within human-specified constraints, a direction that necessitates the pathways for advanced methodologies and technologies in formal verification and mechanistic interpretability.

Rationale for Provably Safe Systems

The authors argue that while current efforts in AI safety, especially alignment techniques, contribute to the goal of making AI systems align with human interests, these measures are insufficient when confronted with threats from potential adversarial AGIs. The argument stems from the principle that an AGI, no matter its functional complexity or autonomy, cannot perform actions deemed provably impossible by mathematical standards. This sets the foundation for a structured approach where proofs serve as indisputable guarantees of safe operations, transcending the need for continuous human oversight.

Core Concepts and Methodologies

The paper propounds several core constructs within its proposed framework:

Provably Compliant Systems (PCS): These systems are designed to adhere to formal specifications through embedded proofs, ensuring compliance with safety-driven protocols.
Proof-Carrying Code (PCC): Software that carries explicit formal proofs within its structure, establishing its compliance with defined safety specifications. PCC is pivotal in mitigating invasive software attacks and ensuring robust cybersecurity.
Provably Compliant Hardware (PCH): Physical hardware systems designed to operate within the constraints defined by programmable contracts (PCs) and enforced through secure hardware solutions.
Provable Contracts and Meta-Contracts: These contracts define operational boundaries and requirements that systems must satisfy before engaging in physical actions. Meta-contracts further govern the creation and modification of these contracts, encoding essential human values and legalities.

Addressing Current and Future Challenges

The paper outlines a series of pressing challenges and anticipates that overcoming these will significantly enhance AI safety. These include automating the processes of formal verification to handle real-world complexities, developing comprehensive verification benchmarks, and harnessing artificial intelligence to synthesize and verify code automatically. The authors emphasize the criticality of rapidly advancing Proof Assistants that can offer automated blockchain verification and secure cryptographic operations, all serving under the ethos of a mathematically proven safety net.

Social and Governance Implications

Beyond the technical dimensions, the paper raises the importance of integrating provable compliance within social and governance structures. It posits the development of game-theory-informed governance models that incentivize cooperation among AI systems and reinforce societal well-being. Mechanisms designed under this framework can align corporate or governmental AI deployments with humanistic goals, ensuring AGIs do not exacerbate competitive or adversarial predicaments but instead foster collaborative progress.

Conclusion and Future Directions

Tegmark and Omohundro advocate for a paradigmatic shift in how AGI systems are conceptualized and developed. They stand resolute that proof-carrying AGIs are not merely a prospective solution but a necessary pathway for ensuring human control over superintelligent systems. The authors identify various research endeavors that require further exploration, inviting the scientific community to engage in refining these safety methodologies.

The implications of provably safe systems are profound, suggesting a future where human-defined safety protocols are incontrovertibly adhered to by AGIs, cementing their trustworthy integration into human society. This foresight stands not just to secure immediate technological interactions but more vitally, to safeguard long-term human agency amidst the rise of intelligent machines.