Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors (2306.15228v2)

Published 27 Jun 2023 in cs.RO and cs.AI

Abstract: Imitation learning has been applied to a range of robotic tasks, but can struggle when robots encounter edge cases that are not represented in the training data (i.e., distribution shift). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human supervisors during task execution and learn from them over time, but different supervisors may demonstrate the task in different ways. Recent work proposes Implicit Behavior Cloning (IBC), which is able to represent multimodal demonstrations using energy-based models (EBMs). In this work, we propose Implicit Interactive Fleet Learning (IIFL), an algorithm that builds on IBC for interactive imitation learning from multiple heterogeneous human supervisors. A key insight in IIFL is a novel approach for uncertainty quantification in EBMs using Jeffreys divergence. While IIFL is more computationally expensive than explicit methods, results suggest that IIFL achieves a 2.8x higher success rate in simulation experiments and a 4.5x higher return on human effort in a physical block pushing task over (Explicit) IFL, IBC, and other baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Agile autonomous driving using end-to-end deep imitation learning. In Robotics: Science and Systems (RSS), 2018.
  2. D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In D. Touretzky, editor, Neural Information Processing Systems (NeurIPS), volume 1. Morgan-Kaufmann, 1988.
  3. Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2884–2890, 2019.
  4. Intermittent visual servoing: Efficiently learning policies robust to instrument changes for high-precision surgical manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 7166–7173, 2021.
  5. Towards autonomous eye surgery by combining deep imitation learning with optimal control. In Conference on Robot Learning (CoRL), 2020.
  6. Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9651–9658, 2020.
  7. Speedfolding: Learning efficient bimanual folding of garments. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8, 2022.
  8. Learning to fold real garments with one arm: A case study in cloud-based robotics research. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
  9. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 627–635, 2011.
  10. ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning. In Conference on Robot Learning (CoRL), 2021.
  11. Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083, 2018.
  12. EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
  13. Fleet-dagger: Interactive robot fleet learning with scalable human supervision. In Conference on Robot Learning (CoRL), 2022.
  14. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.
  15. Pervasive label errors in test sets destabilize machine learning benchmarks. In Neural Information Processing Systems (NeurIPS), 2021.
  16. Implicit behavioral cloning. In Conference on Robot Learning (CoRL), 2021.
  17. A tutorial on energy-based learning. Predicting Structured Data, 1(0), 2006.
  18. H. Jeffreys. The Theory of Probability. Oxford University Press, 1939.
  19. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  20. J. Zhang and K. Cho. Query-efficient imitation learning for end-to-end autonomous driving. In Association for the Advancement of Artificial Intelligence (AAAI), 2017.
  21. Learning from interventions: Human-robot interaction as both explicit and implicit feedback. In Robotics: Science and Systems (RSS), 2020.
  22. Robot learning on the job: Human-in-the-loop autonomy and learning during deployment. arXiv, abs/2211.08416, 2022.
  23. LazyDAgger: Reducing context switching in interactive imitation learning. In IEEE Conference on Automation Science and Engineering (CASE), pages 502–509, 2021.
  24. Mega-dagger: Imitation learning with multiple imperfect experts. ArXiv, arXiv preprint arXiv:2303.00638, 2023.
  25. Eliciting compatible demonstrations for multi-human imitation learning. In Conference on Robot Learning (CoRL), 2022.
  26. S. E. F. Chipman. The Oxford Handbook of Cognitive Science. Oxford University Press, 10 2017. ISBN 9780199842193.
  27. C. M. Bishop. Mixture density networks. Neural Computing Research Group Report, 1994.
  28. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:640–651, 2017.
  29. Transporter networks: Rearranging the visual world for robotic manipulation. Conference on Robot Learning (CoRL), 2020.
  30. Behavior transformers: Cloning k modes with one stone. In Neural Information Processing Systems (NeurIPS), 2022.
  31. Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
  32. Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning (CoRL), 2022.
  33. VIMA: General robot manipulation with multimodal prompts. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  34. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  35. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
  36. D. P. Kingma and M. Welling. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR), 2014.
  37. Generative adversarial networks. In Advances in Neural Information Processing Systems, 2014.
  38. L. van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. URL http://jmlr.org/papers/v9/vandermaaten08a.html.
  39. N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), pages 1–5, 2015.
  40. Trust region policy optimization. In International Conference on Machine Learning, 2015.
  41. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  42. J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151, 1991.
  43. F. Nielsen. Fast approximations of the jeffreys divergence between univariate gaussian mixtures via mixture conversions to exponential-polynomial distributions. Entropy, 23, 2021.
  44. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Neural Information Processing Systems, 2018.
  45. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  46. Representation learning with contrastive predictive coding. ArXiv preprint arXiv:1807.03748, 2018.
  47. M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  48. R. M. Neal. Annealed importance sampling. Statistics and computing, 11:125–139, 2001.
Citations (2)

Summary

We haven't generated a summary for this paper yet.