Incentives for Responsiveness, Instrumental Control and Impact
Abstract: We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings. This journal-length paper extends our conference publications "Incentives for Responsiveness, Instrumental Control and Impact" and "Agent Incentives: A Causal Perspective": the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.