Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Programmatic Imitation Learning from Unlabeled and Noisy Demonstrations (2303.01440v4)

Published 2 Mar 2023 in cs.RO and cs.PL

Abstract: Imitation Learning (IL) is a promising paradigm for teaching robots to perform novel tasks using demonstrations. Most existing approaches for IL utilize neural networks (NN), however, these methods suffer from several well-known limitations: they 1) require large amounts of training data, 2) are hard to interpret, and 3) are hard to repair and adapt. There is an emerging interest in programmatic imitation learning (PIL), which offers significant promise in addressing the above limitations. In PIL, the learned policy is represented in a programming language, making it amenable to interpretation and repair. However, state-of-the-art PIL algorithms assume access to action labels and struggle to learn from noisy real-world demonstrations. In this paper, we propose PLUNDER, a novel PIL algorithm that integrates a probabilistic program synthesizer in an iterative Expectation-Maximization (EM) framework to address these shortcomings. Unlike existing PIL approaches, PLUNDER synthesizes probabilistic programmatic policies that are particularly well-suited for modeling the uncertainties inherent in real-world demonstrations. Our approach leverages an EM loop to simultaneously infer the missing action labels and the most likely probabilistic policy. We benchmark PLUNDER against several established IL techniques, and demonstrate its superiority across five challenging imitation learning tasks under noise. PLUNDER policies achieve 95% accuracy in matching the given demonstrations, outperforming the next best baseline by 19%. Additionally, policies generated by PLUNDER successfully complete the tasks 17% more frequently than the nearest baseline.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. B. Zheng, S. Verma, J. Zhou, I. W. Tsang, and F. Chen, “Imitation learning: Progress, taxonomies and challenges,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–16, 2022.
  2. N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford, et al., “The limits and potentials of deep learning for robotics,” The International journal of robotics research, vol. 37, no. 4-5, pp. 405–420, 2018.
  3. N. Topin and M. Veloso, “Generation of policy-level explanations for reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2514–2521, 2019.
  4. Y. Chebotar, A. Handa, V. Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” in ICRA 2019, pp. 8973–8979, IEEE, 2019.
  5. J. Holtz, A. Guha, and J. Biswas, “Robot action selection learning via layered dimension informed program synthesis,” in Conference on Robot Learning, pp. 1471–1480, 2020.
  6. N. Patton, K. Rahmani, M. Missula, J. Biswas, and I. Dillig, “Program synthesis for robot learning from demonstrations,” 2023.
  7. S. Orfanos and L. H. Lelis, “Synthesizing programmatic policies with actor-critic algorithms and relu networks,” arXiv preprint arXiv:2308.02729, 2023.
  8. A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Using probabilistic movement primitives in robotics,” Autonomous Robots, vol. 42, pp. 529–551, 2018.
  9. Dordrecht: Springer Netherlands, 1998.
  10. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
  11. A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 182–189, JMLR Workshop and Conference Proceedings, 2011.
  12. S. Daftry, J. A. Bagnell, and M. Hebert, “Learning transferable policies for monocular reactive mav control,” in 2016 International Symposium on Experimental Robotics, pp. 3–11, Springer, 2017.
  13. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv:1604.07316, 2016.
  14. C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” arXiv preprint arXiv:2303.04137, 2023.
  15. P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” in CoRL, pp. 158–168, PMLR, 2022.
  16. B. D. Ziebart, A. Maas, et al., “Maximum entropy inverse reinforcement learning,” in Proc. AAAI, pp. 1433–1438, 2008.
  17. D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” in International conference on machine learning, pp. 783–792, PMLR, 2019.
  18. C. Molnar, Interpretable machine learning. Lulu. com, 2020.
  19. J. P. Inala, O. Bastani, Z. Tavares, and A. Solar-Lezama, “Synthesizing programmatic policies that inductively generalize,” in International Conference on Learning Representations, 2019.
  20. D. Trivedi, J. Zhang, S.-H. Sun, and J. J. Lim, “Learning to synthesize programs as interpretable and generalizable policies,” in Advances in Neural Information Processing Systems (M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, eds.), vol. 34, pp. 25146–25163, Curran Associates, Inc., 2021.
  21. Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Advances in neural information processing systems, vol. 30, 2017.
  22. C. V. Perico, J. De Schutter, and E. Aertbeliën, “Learning robust manipulation tasks involving contact using trajectory parameterized probabilistic principal component analysis,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8336–8343, IEEE, 2020.
  23. J. Tremblay, T. To, A. Molchanov, S. Tyree, J. Kautz, and S. Birchfield, “Synthetically trained neural networks for learning human-readable plans from real-world demonstrations,” in ICRA 2018, pp. 5659–5666, IEEE, 2018.
  24. F. Torabi, G. Warnell, and P. Stone, “Generative adversarial imitation from observation,” arXiv preprint arXiv:1807.06158, 2018.
  25. Y. Liu, A. Gupta, P. Abbeel, and S. Levine, “Imitation from observation: Learning to imitate behaviors from raw video via context translation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125, IEEE, 2018.
  26. F. Torabi, G. Warnell, and P. Stone, “Recent advances in imitation learning from observation,” arXiv preprint arXiv:1905.13566, 2019.
  27. L. Zhou, C. Xu, and J. Corso, “Towards automatic learning of procedures from web instructional videos,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
  28. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  29. D. Blessing, O. Celik, X. Jia, M. Reuss, M. X. Li, R. Lioutikov, and G. Neumann, “Information maximizing curriculum: A curriculum-based approach for training mixtures of experts,” arXiv preprint arXiv:2303.15349, 2023.
  30. J. Wei, J. Holtz, I. Dillig, and J. Biswas, “STEADY: Simultaneous state estimation and dynamics learning from indirect observations,” in Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference on, IEEE, 2022.
  31. J. P. Inala, O. Bastani, Z. Tavares, and A. Solar-Lezama, “Synthesizing programmatic policies that inductively generalize,” in International Conference on Learning Representations, 2020.
  32. A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1670–1677, 2015.
  33. R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, 1999.
  34. S. Thrun, “Probabilistic robotics,” Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002.
  35. A. Verma, V. Murali, R. Singh, P. Kohli, and S. Chaudhuri, “Programmatically interpretable reinforcement learning,” in International Conference on Machine Learning, pp. 5045–5054, PMLR, 2018.
  36. A. Doucet, A. M. Johansen, et al., “A tutorial on particle filtering and smoothing: Fifteen years later,” Handbook of nonlinear filtering, vol. 12, no. 656-704, p. 3, 2009.
  37. S. Gulwani, O. Polozov, et al., “Program synthesis,” Foundations and Trends® in Programming Languages, vol. 4, no. 1-2, pp. 1–119, 2017.
  38. S. Gulwani, O. Polozov, and R. Singh, “Program synthesis,” vol. 4, pp. 1–119, 2017.
  39. D. C. Liu and J. Nocedal, “On the limited memory bfgs method for large scale optimization,” Mathematical programming, vol. 45, no. 1-3, pp. 503–528, 1989.
  40. J. Holtz, A. Guha, and J. Biswas, “Interactive robot transition repair with smt,” in International Joint Conference on Artificial Intelligence (IJCAI), pp. 4905–4911, 2018.
  41. E. Leurent, “An environment for autonomous driving decision-making.” https://github.com/eleurent/highway-env.
  42. Q. Gallouédec, N. Cazin, E. Dellandréa, and L. Chen, “panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning,” Self-Supervised and Lifelong Learning at NeurIPS, 2021.
  43. A. O. Ly and M. Akhloufi, “Learning to drive by imitation: An overview of deep behavior cloning methods,” IEEE Transactions on Intelligent Vehicles, vol. 6, no. 2, pp. 195–209, 2021.
  44. N. M. Shafiullah, Z. Cui, A. A. Altanzaya, and L. Pinto, “Behavior transformers: Cloning k𝑘kitalic_k modes with one stone,” Advances in neural information processing systems, vol. 35, pp. 22955–22968, 2022.
  45. S. Chaudhuri, K. Ellis, O. Polozov, R. Singh, A. Solar-Lezama, Y. Yue, et al., “Neurosymbolic programming,” Foundations and Trends® in Programming Languages, vol. 7, no. 3, pp. 158–243, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com