Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Policy Gradient Bayesian Robust Optimization for Imitation Learning (2106.06499v2)

Published 11 Jun 2021 in cs.LG and cs.AI

Abstract: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zaynah Javed (3 papers)
  2. Daniel S. Brown (46 papers)
  3. Satvik Sharma (11 papers)
  4. Jerry Zhu (3 papers)
  5. Ashwin Balakrishna (40 papers)
  6. Marek Petrik (43 papers)
  7. Anca D. Dragan (70 papers)
  8. Ken Goldberg (162 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.