I Know What You Meant: Learning Human Objectives by (Under)estimating Their Choice Set (2011.06118v2)

Published 11 Nov 2020 in cs.RO

Abstract: Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to show the robot their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior is not what the human really meant! Here we propose a different approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works -- which err towards overestimating the human's capabilities -- here we err towards underestimating what the human can input (i.e., their choice set). Our theoretical analysis proves that underestimating the human's choice set is risk-averse, with better worst-case performance than overestimating. We formalize three properties to generate similar and simpler alternatives. Across simulations and a user study, our resulting algorithm better extrapolates the human's objective. See the user study here: https://youtu.be/RgbH2YULVRo

Authors (2)

Ananth Jonnavittula (6 papers)
Dylan P. Losey (55 papers)

Citations (14)

View on Semantic Scholar

Summary

Understanding Human Objectives by Underestimating Choice Sets in Assistive Robotics

The paper presents an innovative approach to enhance reward learning in assistive robotics by underestimating the user's choice set during demonstrations. This method seeks to address the prevalent issue where assistive robots, tasked with learning from human demonstrations, often inaccurately assume that every user is capable of demonstrating the most optimal behavior, leading to suboptimal robot performance.

Key Contributions

The authors propose that instead of overestimating human capabilities, robots should err on the side of underestimating them by considering that human users, particularly those who may be physically limited, might only demonstrate trajectories that are similar or simpler than their true capabilities. This approach is rooted in the idea of risk-aversion, offering better worst-case performance when compared to risk-seeking behavior, which is common in existing reward learning models.

Theoretical Insights

Risk-Aversion in Choice Sets: The paper theoretically proves that underestimating choice sets leads to risk-averse learning, meaning the robots maintain greater uncertainty about user objectives, thus preventing premature convergence on incorrect learning outcomes.
Worst-Case Performance: Under this framework, in the worst-case scenario, robots learn nothing rather than learning the wrong reward, making it a safer approach compared to overestimating choice sets.
Intractability of Complete Choice Sets: The authors assert that waiting for users to reveal a complete choice set through demonstrations is intractable due to the nature of human optimality in teaching behaviors.

Practical Implementation

The authors introduce an algorithm that generates alternative, simpler trajectories from observed demonstrations, thus allowing the robot to better infer the user’s intended objectives. The algorithm formalizes three properties to create similar and simpler alternatives: noisy deformations, sparse inputs, and consistent inputs.

Empirical Evaluation

The approach was evaluated in simulations as well as a user paper involving physical teleoperation tasks with a robotic arm. These scenarios mimic environments where users demonstrate tasks like carrying an item or navigating obstacles under varying skill and capability levels.

Simulation Results: The proposed method demonstrated superior performance across several simulation environments by achieving lower errors and regret in estimated rewards compared to traditional Bayesian inverse reinforcement learning models.
User Study: In physical teleoperation tasks, robots using the proposed method were more effective in extrapolating the true human objective, as evidenced by lower objective errors and subjective user preferences favoring this method.

Implications and Future Directions

The approach underlines a paradigm shift towards more inclusive learning in assistive robotics by acknowledging user limitations and focusing on similarity and simplicity in choice sets. This is particularly noteworthy in the domain of assistive technology, where end-user capabilities can significantly vary.

For future developments in AI and robotics, these findings suggest a need for more personalized learning systems that can generalize from sparse, noisy data without strong assumptions about user capabilities. Further exploration could refine the choice set generation properties to adapt dynamically to individual user profiles and tasks.

Conclusively, this paper lays the groundwork for more robust and user-oriented assistive robots by fundamentally challenging traditional assumptions about user demonstration capabilities and suggesting a more conservative learning approach that prioritizes safety and inclusivity. The techniques discussed here have broad implications in designing intelligent systems that interact with and learn from human users effectively.

PDF Markdown

Related Papers

YouTube

Show All Videos