Reward-rational (implicit) choice: A unifying formalism for reward learning (2002.04833v4)

Published 12 Feb 2020 in cs.LG, cs.AI, cs.HC, and cs.RO

Abstract: It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing the robot away or turning it off. And surely, there is more to come. How will a robot make sense of all these diverse types of behavior? Our key insight is that different types of behavior can be interpreted in a single unifying formalism - as a reward-rational choice that the human is making, often implicitly. The formalism offers both a unifying lens with which to view past work, as well as a recipe for interpreting new sources of information that are yet to be uncovered. We provide two examples to showcase this: interpreting a new feedback type, and reading into how the choice of feedback itself leaks information about the reward.

Citations (163)

View on Semantic Scholar

Summary

The paper introduces a unified framework that treats human actions as approximately optimal choices to infer intended reward functions.
It leverages probabilistic models, including Boltzmann-rational policies, to integrate diverse feedback such as demonstrations and trajectory comparisons.
Experiments in grid-world tasks indicate the approach minimizes regret and outperforms traditional reward inference methods.

Reward-rational (Implicit) Choice: A Unified Framework for Reward Learning

This paper introduces a comprehensive formalism for interpreting diverse human behaviors as evidence for reward learning in robotics. The authors propose that human actions can be viewed as reward-rational implicit choices from a set of options, which can be leveraged to infer intended reward functions. The formalism encompasses a range of feedback types, including demonstrations, trajectory comparisons, corrections in physical interactions, language instructions, and more subtle indicators such as turning a robot off.

Key Observations and Methodological Innovations

Recent advances have broadened the types of human behaviors considered as sources of information for reward learning. This work proposes viewing all these behaviors through a unified lens—each activity can be interpreted as a rational choice informed by the intended rewards humans wish robots to optimize. This single-encompassing perspective not only surveys prior methods but also potentially guides the development of new approaches to interpret human behavior as data for reward learning.

Reward-rational choice treats human behavior as approximately optimal within an implicit or explicit choice set, evaluated by a deterministic grounding function mapping choices to robot trajectories. This assumption leads to the estimation of the intended reward based on probabilistic models of human rationality—Boltzmann-rational policies—where humans are satisficing agents optimizing the reward within certain rationality constraints.

Empirical Implications

Tables and illustrations within the paper demonstrate how each type of feedback imposes constraints on the feasible reward functions, exemplified through grid-world navigation tasks. The unifying framework captures both explicit choices, like comparisons between trajectories, and implicit ones, such as the nuanced information conveyed when humans simply turn off a robot.

Experiments suggest this approach minimizes regret and enhances reward inference across training and testing environments, potentially surpassing traditional methods that rely on demonstrations or comparisons alone.

Future Directions

By embracing this unified formalism, robots could engage a combination of feedback, selectively choosing the most informative type of human input. Further, this paper proposes integrating the meta-choice—the selection of feedback type—as an additional layer of information leakage. Interpreted through the reward-rational choice lens, this meta-reasoning could refine reward estimates even more effectively.

This foundation invites further research into actively mixing feedback types, leveraging pragmatic reasoning in language instructions, and exploring innovative feedback mechanisms. The paper outlines pathways to optimize the collaboration between human guidance and machine interpretation, ultimately aiming to enhance alignment between robotic behaviors and human values.

Conclusion

The unified framework underlines a robust approach toward refining reward learning systems, presenting both theoretical rigor and empirical effectiveness. This work positions the reward-rational choice model as a critical tool in bridging the gap between human instructions and autonomous robotic optimization, promising significant advancements in creating responsive, human-aligned intelligent systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/CFGeek/status/1749089389219545348