Toward Idealized Decision Theory

Published 7 Jul 2015 in cs.AI | (1507.01986v1)

Abstract: This paper motivates the study of decision theory as necessary for aligning smarter-than-human artificial systems with human interests. We discuss the shortcomings of two standard formulations of decision theory, and demonstrate that they cannot be used to describe an idealized decision procedure suitable for approximation by artificial systems. We then explore the notions of policy selection and logical counterfactuals, two recent insights into decision theory that point the way toward promising paths for future research.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (37)

View on Semantic Scholar

Summary

Toward Idealized Decision Theory: A Comprehensive Summary

The paper "Toward Idealized Decision Theory" by Nate Soares and Benja Fallenstein addresses fundamental challenges in constructing decision theories (DTs) that are compatible with the design of smarter-than-human artificial intelligence systems. This work highlights the deficiencies in existing decision theories, such as Evidential Decision Theory (EDT) and Causal Decision Theory (CDT), and proposes new research directions through concepts like policy selection and logical counterfactuals to better tackle the decision-making processes of AI systems.

Shortcomings of Existing Decision Theories

The current standard formulations of decision theory—EDT and CDT—each demonstrate critical limitations when applied to the field of artificial intelligence. EDT, which evaluates actions based on the utility of outcomes conditioned on the action being taken, struggles with situations where conditioning would imply spurious correlations or when events of zero probability cannot be conditioned. This leads to irrational actions, such as in the “Evidential Blackmail” scenario where an agent would be coerced into paying by evidential reasoning based purely on expectations from Bayesian probability.

CDT, contrastingly, evaluates actions by their causal effects, thus avoiding the psychological pitfalls of EDT. However, CDT encounters its own challenges in scenarios with logical correlations, such as symmetric interactions where logical similarity rather than causal independence could dictate optimal decisions. The paper underscores this problem with examples like the "Prisoner's Dilemma" and "Counterfactual Blackmail", where CDT advises actions that result in suboptimal outcomes due to its oversight of logical relations between agents’ actions.

Towards More Robust Decision Theories

To counter these limitations, the authors argue for a shift towards decision theories that capitalize on two new insights: policy selection and logical counterfactuals.

Policy Selection: Involves selecting an entire policy—mapping observations to actions—rather than recommending single actions after observations. As opposed to acting on information available at decision time alone, a policy-centric approach assesses actions based on their adherence to preselected strategies that maximize expected utility in broader contexts, thereby mitigating the errors of both EDT and CDT in complex scenarios.

Logical Counterfactuals: Instead of relying solely on causal counterfactuals, agents should consider the logical implications of their actions across all instances of a decision-making algorithm. This approach directly addresses logical correlations ignored by CDT, allowing improved decision-making in cases where other decision processes are interdependent on the agent’s choices.

Implications and Future Directions

The concept of Updateless Decision Theory (UDT), which amalgamates policy selection and logical counterfactuals, emerges as a promising path for refining decision theories applicable to AI systems. However, finding a satisfactory formalization of logical counterfactuals remains a significant hurdle. The extension from causal to logical graphs, integrating the logical dependencies inherent in real-world scenarios, requires innovative methods to define and manipulate these logical links precisely.

The paper emphasizes the urgency of developing a robust understanding of decision-making issues in anticipation of deploying complex AI systems, stressing the consequences of deploying AI guided by incomplete or flawed decision theories. While UDT and its components such as proof-based formalizations offer preliminary sketches of potential solutions, the refinement of these theories is ongoing, with future research needed to overcome the identified challenges and possibly discover unforeseen ones.

In conclusion, the paper holds a forward-looking view, suggesting optimism for the feasibility of a more comprehensive understanding of decision theories. Such an understanding would ensure that future AI aligns with human values and intentions, even when faced with unprecedented decision-making challenges. This study serves as a call to action for further and intensive exploration into decision theory to secure the safe development of autonomous intelligent systems.

Markdown Report Issue