Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function) (1901.00064v3)

Published 31 Dec 2018 in cs.AI

Abstract: Utility functions or their equivalents (value functions, objective functions, loss functions, reward functions, preference orderings) are a central tool in most current machine learning systems. These mechanisms for defining goals and guiding optimization run into practical and conceptual difficulty when there are independent, multi-dimensional objectives that need to be pursued simultaneously and cannot be reduced to each other. Ethicists have proved several impossibility theorems that stem from this origin; those results appear to show that there is no way of formally specifying what it means for an outcome to be good for a population without violating strong human ethical intuitions (in such cases, the objective function is a social welfare function). We argue that this is a practical problem for any machine learning system (such as medical decision support systems or autonomous weapons) or rigidly rule-based bureaucracy that will make high stakes decisions about human lives: such systems should not use objective functions in the strict mathematical sense. We explore the alternative of using uncertain objectives, represented for instance as partially ordered preferences, or as probability distributions over total orders. We show that previously known impossibility theorems can be transformed into uncertainty theorems in both of those settings, and prove lower bounds on how much uncertainty is implied by the impossibility results. We close by proposing two conjectures about the relationship between uncertainty in objectives and severe unintended consequences from AI systems.

Citations (41)

View on Semantic Scholar

Summary

The paper challenges the use of utility functions for defining objectives in AI value alignment, arguing they fail in complex ethical contexts by drawing parallels to ethical impossibility theorems.
It discusses how ethical paradoxes, like Arrow's Impossibility Theorem, highlight the inherent difficulty in aggregating multiple, often competing, ethical objectives into a single preference order for AI systems.
The author proposes incorporating uncertainty into AI objectives (e.g., via partial orders) and presents a formal uncertainty theorem showing the minimum uncertainty required to resolve cyclic impossibility found in ethical paradoxes.

AI Value Alignment: Impossibility and Uncertainty Theorems

The paper "Impossibility and Uncertainty Theorems in AI Value Alignment" presents a critical analysis of the role of utility functions in machine learning systems, highlighting the challenges they pose in ethical decision-making contexts. The author, Peter Eckersley, argues against the use of utility functions to define objectives in systems with multi-dimensional goals, drawing parallels to foundations in ethical theory where impossibility theorems have exposed the limitations of such formal specifications.

Overview

Utility functions are widely utilized in machine learning models to formalize objectives through constructs such as value functions, loss functions, and preference orderings. However, when applied to scenarios demanding ethical decisions—such as autonomous weapons systems or medical resource allocation—these functions fail to adequately capture ethical complexity and often run afoul of broad ethical intuitions. Eckersley notes that ethicists have derived impossibility theorems, demonstrating the inherent contradictions in aggregating individual utilities into a comprehensive social welfare framework without violating ethical principles. This inadequacy stems from the pursuit of independent, often competing, objectives which cannot be reduced to singular ordering terms.

Impossibility Theorems

Prominently featured are ethical impossibility theorems such as Arrow's Impossibility Theorem, which underscores the inability to derive a satisfactory social preference ordering from individual preferences, and Arrhenius's work that exposes paradoxes in utilitarian population ethics. The paper delineates how these theoretical underpinnings highlight the infeasibility of aligning AI with human ethical standards when using conventional objective functions.

Introducing Uncertainty

To circumvent such paradoxes, Eckersley proposes embracing uncertain objectives, explored through frameworks like partial orders or probability distributions over total orders. The crux of these alternative approaches is the accommodation of uncertainty, representing a shift from deterministic to probabilistic decision processes in AI systems.

Key Results

Uncertainty Bound: The paper establishes a formal uncertainty theorem that quantifies the minimum level of uncertainty required to resolve cyclic impossibility within objective functions. It shows that for ethical paradoxes treated as uncertainty problems, at least two constraints must be uncertainly satisfied.
Decision Rules: It discusses probabilistic decision-making rules that manage uncertainty, contrasting with approaches demanding rigorous prioritization of one outcome over others, resembling human decision-making in ethical dilemmas.

Implications and Future Work

The implications of this research are profound for the development and deployment of AI in ethically charged domains. The delineation of totalitarian convergence and pluralistic non-convergence conjectures provides a novel lens to interpret the tendencies of AI systems under moral certainty versus uncertainty. Moving forward, further exploration is warranted into decision rules that maximize the efficacy of uncertain objectives, and into alternative models that might better address ethical alignment without relying on total order assumptions.

Conclusion

Eckersley’s paper challenges conventional paradigms in AI objective formulation, urging the integration of ethical uncertainty into systems capable of making high-stakes decisions. By grounding future AI developments in frameworks embracing moral complexity and uncertainty, researchers can potentially mitigate risks associated with rigid goal structures, leading to systems that not only assess outcomes with nuance but act with measured discretion and responsibility.

Related Papers

Tweets

https://twitter.com/anderssandberg/status/1921694714312462715

https://twitter.com/QonoVuor/status/1783112606753931415

YouTube

Show All Videos