Two Types of AI Existential Risk: Decisive and Accumulative (2401.07836v2)

Published 15 Jan 2024 in cs.CY, cs.AI, and cs.LG

Abstract: The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of econopolitical structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view reconciles seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI risks as well as long-term AI safety are discussed.

PDF Abstract

Two Types of AI Existential Risk: Decisive and Accumulative

The discourse surrounding existential risks from AI, traditionally concerns scenarios in which advanced AI systems pose abrupt, profound threats that could lead to human extinction or render any recovery impossible. Such assessments often anticipate a sudden, large-scale event, potentially catalyzed by artificial general intelligence (AGI) or artificial superintelligence (ASI). The paper in question challenges this conventional narrative by distinguishing between two forms of AI existential risk: decisive and accumulative, and it posits an alternative trajectory involving gradual, augmented disruptions that accumulate over time.

The decisive AI x-risk hypothesis suggests that existential threats arise from abrupt, large-scale events triggered by a superintelligent system pursuing misaligned goals with overwhelming rapidity. This hypothesis is rooted in specific, singular events that have catastrophic implications akin to the apocalyptic consequences often anticipated around uncontrollable superintelligence scenarios. It emphasizes the potential for rapid system collapse following a trigger event, thereby portraying AI existential risk as a matter of acute and sudden large-scale impact.

Conversely, the accumulative AI x-risk hypothesis proposes a model where AI-induced risks emerge incrementally through a series of interconnected, smaller disruptions manifesting over time. This model draws parallels to global existential threats like climate change and nuclear proliferation, wherein risks become significant through the accumulation of individual, seemingly minor, events that collectively exceed a critical threshold. This hypothesis argues for a "boiling frog" scenario, wherein society remains unaware of the creeping risks until they converge, surpassing a point of irreversibility.

To elucidate these distinctions, the paper employs a systems analysis perspective, emphasizing core characteristics such as non-linearity, connectedness, and feedback loops. The decisive hypothesis is illustrated through the potential of rapid cascading effects triggered by ASI, asserting a networked system's super-connectedness to ASI and highlighting unidirectional feedback loops that reinforce the system's deviation toward catastrophic states. This singular focus on ASI underscores a rather homogeneously structured risk profile, driven by a centralized AI force.

On the other hand, the accumulative hypothesis spots diversified, localized cause-impact cycles that contribute to broader systemic changes — suggesting a selective and heterogeneous connectivity within subsystems that may not directly link to overwhelming outcomes yet can destabilize critical system nodes over time. This highlights multidirectional feedback loops, exposing various subsystems to cumulative stress as alterations in one area propagate influence reciprocally across others, potentially reaching a critical tipping point.

The scenarios proposed to illustrate these hypotheses underscore diverging pathways — from high-profile, abrupt catastrophic events, to gradual crises yielding widespread destabilizations. The "perfect storm MISTER scenario" postulated in the paper, is an illustrative framework that details potential socioeconomic collapses driven by compounded ethical and social risks, offering tangible insights into the accumulative hypothesis. Here, the paper suggests the emergence of AI threats that interweave manipulation, insecurity, surveillance, trust erosion, economic destabilization, and rights infringement (MISTER), illustrating the convergence of ethical risks into existential consequences over time.

The implications of the accumulative hypothesis are multifaceted. It suggests that delineated governance strategies cannot afford to overlook the gradual buildup of AI-induced disruptions and must instead account for both short-term ethical risks and long-term existential hazards in a consolidated risk management framework. This requires embedding flexibility and responsiveness into policy architectures, facilitating nimble adaptation to evolving risk landscapes, and harmonizing ethical protocols with existential threat assessments.

Furthermore, this hypothesis advocates for an informed public discourse free from polarized narratives, urging stakeholders to adopt a holistic understanding of AI's diverse potential impacts. By expanding risk analysis from the singular catastrophic events model to nuanced regulatory and research priorities, policymakers could adaptively address spectrum-wide AI risks.

Addressing potential criticisms, this expanded risk framework expressly accounts for our modern interconnectedness and reliance on technological infrastructures that historically lack contingency resilience. While complexity and unpredictability are inherent challenges, the hypothesis prompts the development of comprehensive systemic simulations to predict risk manifestations within highly dynamic, interlinked global scenarios.

Intriguingly, this paper shifts the discourse on AI existential risk by presenting an additional lens through which to assess AI's evolving threat matrix. It questions prevailing orthodoxy and advocates for an integrated approach to AI risk management, one that recognizes the interplay between ethical, social, and existential AI challenges. As AI continues its transformative trajectory, embracing these nuanced perspectives may prove essential in safeguarding humanity's future amidst an increasingly automated world.