Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks (2402.13466v1)
Abstract: Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this paper presents a novel interactive learning method that uses demonstrator-perceived precision as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL's effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods.
- T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, and J. Peters, “An algorithmic perspective on imitation learning,” Found. and Trends® in Robotics, vol. 7, no. 1-2, pp. 1–179, 2018.
- C. Celemin, R. Pérez-Dattari, E. Chisari, G. Franzese, L. de Souza Rosa, R. Prakash, Z. Ajanović, M. Ferraz, A. Valada, J. Kober et al., “Interactive imitation learning in robotics: A survey,” Found. and Trends® in Robotics, vol. 10, no. 1-2, pp. 1–197, 2022.
- W. A. Wickelgren, “Speed-accuracy tradeoff and information processing dynamics,” Acta psychologica, vol. 41, no. 1, pp. 67–85, 1977.
- K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer, “EnsembleDAgger: A Bayesian approach to safe imitation learning,” in IEEE/RSJ Int. Conf. on Intelli. Robots and Sys., 2019, pp. 5041–5048.
- J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in Proceedings of the AAAI Conf. on Artificial Intelli., 2017, p. 2891–2897.
- R. Hoque, A. Balakrishna, C. Putterman, M. Luo, D. S. Brown, D. Seita, B. Thananjeyan, E. Novoseller, and K. Goldberg, “LazyDAgger: Reducing Context Switching in Interactive Imitation Learning,” in IEEE Int. Conf. on Autom. Sci. and Engineering, 2021, pp. 502–509.
- R. Hoque, A. Balakrishna, E. Novoseller, A. Wilcox, D. S. Brown, and K. Goldberg, “ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning,” in Conf. on Robot Learning, 2021, pp. 598–608.
- M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg, “DART: Noise injection for robust imitation learning,” in Conf. on Robot Learning, 2017, pp. 143–156.
- H. Oh, H. Sasaki, B. Michael, and T. Matsubara, “Bayesian Disturbance Injection: Robust imitation learning of flexible policies for robot manipulation,” Neural Networks, vol. 158, pp. 42–58, 2023.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Int. Conf. on Artificial Intelli. and Statistics, 2011, pp. 627–635.
- M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer, “HG-DAgger: Interactive imitation learning with human experts,” in IEEE Int. Conf. Robot. Autom., 2019, pp. 8077–8083.
- R. Hoque, L. Y. Chen, S. Sharma, K. Dharmarajan, B. Thananjeyan, P. Abbeel, and K. Goldberg, “Fleet-DAgger: Interactive robot fleet learning with scalable human supervision,” in Conf. on Robot Learning, 2023, pp. 368–380.
- A. J. Nagengast, D. A. Braun, and D. M. Wolpert, “Risk sensitivity in a motor task with speed-accuracy trade-off,” Journal of neurophysiology, vol. 105, no. 6, pp. 2668–2674, 2011.
- H.-I. Lin and C. G. Lee, “Speed-accuracy optimization for skill learning,” in IEEE Int. Conf. Robot. Autom., 2009, pp. 2506–2511.
- L. Murphy and P. Newman, “Risky planning: Path planning over costmaps with a probabilistically bounded speed-accuracy tradeoff,” in IEEE Int. Conf. Robot. Autom., 2011, pp. 3727–3732.
- C. M. Harris and D. M. Wolpert, “Signal-dependent noise determines motor planning,” Nature, vol. 394, no. 6695, pp. 780–784, 1998.
- D. Nix and A. Weigend, “Estimating the mean and variance of the target probability distribution,” in Proceedings of IEEE Int. Conf. on Neural Networks, vol. 1, 1994, pp. 55–60.
- D. M. Hamby, “A review of techniques for parameter sensitivity analysis of environmental models,” Environmental monitoring and assessment, vol. 32, no. 2, pp. 135–154, 1994.
- M. Bain and C. Sammut, “A framework for behavioural cloning.” in Machine Intelli., 1995, pp. 103–129.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” in arXiv:1606.01540, 2016.
- Y. Zhu, J. Wong, A. Mandlekar, and R. Martín-Martín, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv:2009.12293, 2020.
- A. Majumdar, S. Singh, A. Mandlekar, and M. Pavone, “Risk-sensitive inverse reinforcement learning via coherent risk models.” in Proceedings of Robotics: Sci. and Sys., 2017.
- Hanbit Oh (19 papers)
- Takamitsu Matsubara (54 papers)