Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021) (2112.15422v1)

Published 25 Nov 2021 in cs.AI

Abstract: The paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, it is still undesirable to use this approach for the development of artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.

Citations (61)

Summary

  • The paper challenges the sufficiency of scalar rewards by demonstrating they cannot capture the inherent multi-objective nature of both biological systems and artificial intelligence.
  • The paper highlights that multi-objective frameworks address trade-offs between conflicting goals, improving decision-making in areas like perception, social interaction, and language.
  • The paper advocates for multi-policy learning and ethical oversight in AGI development to ensure safe and adaptable optimization of complex utility functions.

Scalar Reward is Not Enough: A Response to Silver et al.

The paper, "Scalar Reward is Not Enough: A Response to Silver, Singh, Precup, and Sutton (2021)," rigorously critiques the proposition that a scalar reward signal suffices to underpin all aspects of both natural and artificial intelligence. The authors of this response argue that Silver et al.'s focus on scalar rewards neglects critical aspects of intelligence that inherently require multi-objective decision-making.

Limitations of Scalar Rewards

The paper delineates several limitations of scalar rewards:

  1. Trade-offs Between Conflicting Objectives: Many real-world tasks involve balancing multiple conflicting objectives. In biological systems, organisms must address numerous drives such as hunger, social interaction, and pain avoidance. In computational systems, confining decision-making to scalar rewards oversimplifies this complexity.
  2. Utility Representation: Scalar rewards are inadequate for representing all forms of desired utility, especially when non-linear trade-offs between objectives are involved. In contrast, multi-objective representations can directly optimize complex utility functions.
  3. Flexibility and Adaptation: Multi-objective representations offer greater adaptability to changing environments and goals. Multi-policy learning in multi-objective reinforcement learning enables agents to promptly adapt to new utility functions, which is a significant advantage over scalar reward-based agents.

Multi-Objective Nature of Intelligent Capabilities

The authors of the response critique the argument by Silver et al. that various intelligent abilities could arise from the maximization of a scalar reward. For instance:

  • Perception: Effective perception often requires balancing the costs of acquiring information against the potential benefits, which inherently calls for multi-objective trade-offs.
  • Social Intelligence: Multi-agent environments necessitate considering multiple agents' objectives, which scalar rewards cannot sufficiently capture.
  • Language: Human language serves multiple roles beyond the mere communication of factual information, such as emotional expression and relationship maintenance, which are multi-objective in nature.
  • Generalization: While scalar rewards might enable some forms of generalization, adapting to new tasks and environments often necessitates considering multiple objectives, a flexibility better afforded by multi-objective representations.

Biological Evidence for Multi-Objective Intelligence

The paper underscores that natural intelligences, derived through evolutionary processes, are inherently multi-objective. For example:

  • Behavioral Objectives: Organisms exhibit a variety of innate drives, such as hunger, thirst, and social bonding, which cannot be reduced to a single scalar reward.
  • Neurochemical and Neuroanatomical Evidence: The brain's reward processes involve multiple neurochemical signals such as dopamine for various types of reward prediction errors, and oxytocin for social bonding, indicating a multi-objective architecture.

Practical and Theoretical Implications

For AGI Development: The paper argues against employing scalar rewards for AGI development, citing the risks of unsafe and unethical behavior. In scenarios analogous to the "paperclip maximizer," unbounded scalar reward maximization could lead to unintended and dangerous outcomes. Multi-objective approaches, coupled with mechanisms for ethical and safety oversight, offer a more responsible pathway for AGI development.

Future Developments in AI: The authors advocate for multi-objective AI systems that incorporate a feedback review mechanism. Such systems can better align with human values and priorities, offering adaptability and transparency. Explicitly acknowledging multiple objectives allows for more nuanced and accountable decision-making processes, vital for the safe deployment of AGI.

Conclusion

The authors of "Scalar Reward is Not Enough" provide a robust counter-argument to the reward-is-enough hypothesis, emphasizing the need for multi-objective approaches in both understanding natural intelligence and developing artificial general intelligence. The complexity of real-world tasks, the diversity of biological drives, and the necessity for adaptable and ethical AGI systems all underscore the inadequacy of scalar rewards. This comprehensive critique presents compelling evidence and reasoning to guide future AI research towards more sophisticated, multi-objective frameworks.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com