Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks (2402.13466v1)

Published 21 Feb 2024 in cs.RO

Abstract: Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this paper presents a novel interactive learning method that uses demonstrator-perceived precision as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL's effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, and J. Peters, “An algorithmic perspective on imitation learning,” Found. and Trends® in Robotics, vol. 7, no. 1-2, pp. 1–179, 2018.
  2. C. Celemin, R. Pérez-Dattari, E. Chisari, G. Franzese, L. de Souza Rosa, R. Prakash, Z. Ajanović, M. Ferraz, A. Valada, J. Kober et al., “Interactive imitation learning in robotics: A survey,” Found. and Trends® in Robotics, vol. 10, no. 1-2, pp. 1–197, 2022.
  3. W. A. Wickelgren, “Speed-accuracy tradeoff and information processing dynamics,” Acta psychologica, vol. 41, no. 1, pp. 67–85, 1977.
  4. K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer, “EnsembleDAgger: A Bayesian approach to safe imitation learning,” in IEEE/RSJ Int. Conf. on Intelli. Robots and Sys., 2019, pp. 5041–5048.
  5. J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in Proceedings of the AAAI Conf. on Artificial Intelli., 2017, p. 2891–2897.
  6. R. Hoque, A. Balakrishna, C. Putterman, M. Luo, D. S. Brown, D. Seita, B. Thananjeyan, E. Novoseller, and K. Goldberg, “LazyDAgger: Reducing Context Switching in Interactive Imitation Learning,” in IEEE Int. Conf. on Autom. Sci. and Engineering, 2021, pp. 502–509.
  7. R. Hoque, A. Balakrishna, E. Novoseller, A. Wilcox, D. S. Brown, and K. Goldberg, “ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning,” in Conf. on Robot Learning, 2021, pp. 598–608.
  8. M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg, “DART: Noise injection for robust imitation learning,” in Conf. on Robot Learning, 2017, pp. 143–156.
  9. H. Oh, H. Sasaki, B. Michael, and T. Matsubara, “Bayesian Disturbance Injection: Robust imitation learning of flexible policies for robot manipulation,” Neural Networks, vol. 158, pp. 42–58, 2023.
  10. S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Int. Conf. on Artificial Intelli. and Statistics, 2011, pp. 627–635.
  11. M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer, “HG-DAgger: Interactive imitation learning with human experts,” in IEEE Int. Conf. Robot. Autom., 2019, pp. 8077–8083.
  12. R. Hoque, L. Y. Chen, S. Sharma, K. Dharmarajan, B. Thananjeyan, P. Abbeel, and K. Goldberg, “Fleet-DAgger: Interactive robot fleet learning with scalable human supervision,” in Conf. on Robot Learning, 2023, pp. 368–380.
  13. A. J. Nagengast, D. A. Braun, and D. M. Wolpert, “Risk sensitivity in a motor task with speed-accuracy trade-off,” Journal of neurophysiology, vol. 105, no. 6, pp. 2668–2674, 2011.
  14. H.-I. Lin and C. G. Lee, “Speed-accuracy optimization for skill learning,” in IEEE Int. Conf. Robot. Autom., 2009, pp. 2506–2511.
  15. L. Murphy and P. Newman, “Risky planning: Path planning over costmaps with a probabilistically bounded speed-accuracy tradeoff,” in IEEE Int. Conf. Robot. Autom., 2011, pp. 3727–3732.
  16. C. M. Harris and D. M. Wolpert, “Signal-dependent noise determines motor planning,” Nature, vol. 394, no. 6695, pp. 780–784, 1998.
  17. D. Nix and A. Weigend, “Estimating the mean and variance of the target probability distribution,” in Proceedings of IEEE Int. Conf. on Neural Networks, vol. 1, 1994, pp. 55–60.
  18. D. M. Hamby, “A review of techniques for parameter sensitivity analysis of environmental models,” Environmental monitoring and assessment, vol. 32, no. 2, pp. 135–154, 1994.
  19. M. Bain and C. Sammut, “A framework for behavioural cloning.” in Machine Intelli., 1995, pp. 103–129.
  20. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” in arXiv:1606.01540, 2016.
  21. Y. Zhu, J. Wong, A. Mandlekar, and R. Martín-Martín, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv:2009.12293, 2020.
  22. A. Majumdar, S. Singh, A. Mandlekar, and M. Pavone, “Risk-sensitive inverse reinforcement learning via coherent risk models.” in Proceedings of Robotics: Sci. and Sys., 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Hanbit Oh (19 papers)
  2. Takamitsu Matsubara (54 papers)

Summary

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

In the paper "Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks" by Hanbit Oh and Takamitsu Matsubara, the authors address the challenge of safe and efficient interactive imitation learning (IIL) in environments with limited clearance, where collision risks are significant. The paper presents a new method, Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL), which aims to enhance the safety of IIL by incorporating human-like precision sensitivity into robot training.

Key Insights and Contributions

The core contribution of the paper is DPIIL, which leverages the inherent precision demonstrated by humans in delicate tasks through a novel interpretation of the speed-accuracy trade-off. This method allows a robot to better estimate the collision risks in tasks where precision is crucial, such as industrial insertion. The humans, when demonstrating, inherently adjust their behavior to avoid collisions, and this observance is used innovatively in DPIIL to enhance robot learning without explicit environmental models.

The authors introduce a probabilistic neural network model to estimate the speed and implicit precision of human demonstrations. This model learns to capture the speed distribution in human movements, which is directly translated into a measure of environmental precision. By combining this measure with the policy's epistemic uncertainty—calculated by an ensemble of learned policies—DPIIL effectively gauges collision risks, prompting human intervention when necessary.

Evaluation and Comparative Analysis

DPIIL was evaluated both in simulations and on real robots, using a UR5e robotic arm for tasks such as aperture-passing and ring-threading. The authors compared DPIIL against several baseline methods, including DAgger, EnsembleDAgger, and ThriftyDAgger, among others. The results indicated that DPIIL not only significantly improved the safety and efficiency of the training phase but also yielded superior robot performance in autonomous execution testing.

In the aperture-passing simulation, DPIIL showcased a higher interactive performance than comparable methods, with average success probabilities during training reaching up to 96%. Notably, in robot-autonomous performance tests subsequent to training, DPIIL's success rate climbed to 100% in some configurations, demonstrating its efficacy in learning complex, precision-intensive tasks safely.

Implications and Future Directions

The DPIIL method introduced in this paper has significant implications for improving the safety and efficiency of robot training in high-precision tasks without requiring detailed models of the environment. By utilizing human expertise more effectively, it bridges the gap between safety and learning efficiency, paving the way for broader applications in industrial and autonomous systems.

Future research could explore the method's robustness across different types of demonstration noises and varying human expert skill levels. Additionally, applying this method to other domains that require high safety and precision may further validate its versatility and efficiency in diverse real-world settings.

In conclusion, the paper provides a solid foundation for safer IIL practices, lifting constraints imposed by collision risks in environments with narrow clearances. It stands as a noteworthy advancement in the practical applications of imitation learning, contributing meaningfully to the field of robotics.

Youtube Logo Streamline Icon: https://streamlinehq.com