Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning (2007.00245v3)

Published 1 Jul 2020 in cs.RO

Abstract: Supervised imitation learning, also known as behavioral cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigate this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return.' The agent's policy is then retrained using this new corrective data. This approach alone can enable high-performance agents to be learned, but at a substantial cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert Burden in intervention-based learning), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This permits seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning method and dramatically improved performance over an equivalent amount of data learned with behavioral cloning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, and J. A. Bagnell, “Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction,” in Proceedings of the 34th International Conference on Machine Learning (ICML’17), vol. 70, International Convention Centre, Sydney, Australia, Aug. 2017, pp. 3309–3318.
  2. T. Zhang, Z. McCarthy, O. Jowl, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18), Brisbane, QLD, Australia, May 2018, pp. 5628–5635.
  3. D. A. Pomerleau, “ALVINN: An Autonomous Land Vehicle in a Neural Network,” in Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’89), Denver, CO, USA, 1989, pp. 305–313.
  4. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to End Learning for Self-Driving Cars,” arXiv:1604.07316 [cs], Apr. 2016.
  5. A. Giusti, J. Guzzi, D. C. Ciresan, F.-L. He, J. P. Rodriguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella, “A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots,” IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661–667, July 2016.
  6. S. Ross, G. J. Gordon, and D. Bagnell, “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’11), Fort Lauderdale, FL, USA, 2011, pp. 627–635.
  7. Y. Pan, C.-A. Cheng, K. Saigol, K. Lee, X. Yan, E. A. Theodorou, and B. Boots, “Agile Autonomous Driving Using End-to-End Deep Imitation Learning,” in Proceedings of Robotics: Science and Systems (RSS’18), Pittsburgh, PA, USA, June 2018.
  8. M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer, “HG-DAgger: Interactive Imitation Learning with Human Experts,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’19), Montreal, QC, Canada, May 2019, pp. 8077–8083.
  9. S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A. Bagnell, and M. Hebert, “Learning Monocular Reactive UAV Control in Cluttered Natural Environments,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’13), Karlsruhe, Germany, May 2013, pp. 1765–1772.
  10. K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer, “EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’19), Macau, China, Nov. 2019, pp. 5041–5048.
  11. M. Laskey, S. Staszak, W. Y. Hsieh, J. Mahler, F. T. Pokorny, A. D. Dragan, and K. Goldberg, “SHIV: Reducing Supervisor Burden in DAgger Using Support Vectors for Efficient Learning from Demonstrations in High Dimensional State Spaces,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’16), Stockholm, Sweden, May 2016, pp. 462–469.
  12. B. Kim and J. Pineau, “Maximum Mean Discrepancy Imitation Learning,” in Proceedings of Robotics: Science and Systems (RSS’13), Berlin, Germany, June 2013.
  13. Y. Cui, D. Isele, S. Niekum, and K. Fujimura, “Uncertainty-Aware Data Aggregation for Deep Imitation Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’19), Montreal, QC, Canada, May 2019, pp. 761–767.
  14. J. Zhang and K. Cho, “Query-Efficient Imitation Learning for End-to-End Simulated Driving,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17), New York, NY, USA, Feb. 2017, pp. 2891–2897.
  15. V. G. Goecks, G. M. Gremillion, V. J. Lawhern, J. Valasek, and N. R. Waytowich, “Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Jan. 2019, pp. 2462–2470.
  16. J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, Dec. 2016, pp. 4565–4573.
  17. J. Fu, K. Luo, and S. Levine, “Learning Robust Rewards with Adverserial Inverse Reinforcement Learning,” in Proceedings of the International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada, Apr. 2018.
  18. S. Cabi, S. Gómez Colmenarejo, A. Novikov, K. Konyushova, S. Reed, R. Jeong, K. Zolna, Y. Aytar, D. Budden, M. Vecerik, O. Sushkov, D. Barker, J. Scholz, M. Denil, N. de Freitas, and Z. Wang, “Scaling Data-Driven Robotics with Reward Sketching and Batch Reinforcement Learning,” in Proceedings of Robotics: Science and Systems (RSS’20), July 2020.
  19. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Montreal, QC, Canada, Dec. 2014, pp. 2672–2680.
  20. I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson, “Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning,” in Proceedings of the International Conference on Learning Representations (ICLR’19), New Orleans, LA, USA, May 2019.
  21. J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust Region Policy Optimization,” in Proceedings of the 32nd International Conference on Machine Learning (ICML’15), Lille, France, July 2015, pp. 1889–1897.
  22. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv:1707.06347 [cs], July 2017.
  23. F. Sasaki, T. Yohira, and A. Kawaguchi, “Sample Efficient Imitation Learning for Continuous Control,” in Proceedings of the International Conference on Learning Representations (ICLR’19), New Orleans, LA, USA, May 2019.
  24. R. Jena, C. Liu, and K. Sycara, “Augmenting GAIL with BC for Sample Efficient Imitation Learning,” in Proceedings of Robotics: Science and Systems (RSS’20) Workshop on Advances & Challenges in Imitation Learning for Robotics, July 2020.
  25. S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” in Proceedings of the 35th International Conference on Machine Learning (ICML’18), Stockholm, Sweden, July 2018, pp. 1582–1591.
  26. V. G. Goecks, G. M. Gremillion, V. J. Lawhern, J. Valasek, and N. R. Waytowich, “Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments,” in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMS’20), Richland, SC, USA, May 2020, pp. 465–473.
  27. A. Rajeswaran*, V. Kumar*, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations,” in Proceedings of Robotics: Science and Systems (RSS’18), Pittsburgh, PA, USA, June 2018.
  28. E. Coumans and Y. Bai, “PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning,” 2016.
  29. M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V. Kumar, and W. Zaremba, “Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research,” arXiv:1802.09464 [cs], Mar. 2018.
  30. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved Training of Wasserstein GANs,” in Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, Dec. 2017, pp. 5767–5777.
  31. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proceedings of the International Conference on Learning Representations (ICLR’15), San Diego, CA, USA, May 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Trevor Ablett (11 papers)
  2. Filip Marić (27 papers)
  3. Jonathan Kelly (84 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.