IDIL: Imitation Learning of Intent-Driven Expert Behavior (2404.16989v1)
Abstract: When faced with accomplishing a task, human experts exhibit intentional behavior. Their unique intents shape their plans and decisions, resulting in experts demonstrating diverse behaviors to accomplish the same task. Due to the uncertainties encountered in the real world and their bounded rationality, experts sometimes adjust their intents, which in turn influences their behaviors during task execution. This paper introduces IDIL, a novel imitation learning algorithm to mimic these diverse intent-driven behaviors of experts. Iteratively, our approach estimates expert intent from heterogeneous demonstrations and then uses it to learn an intent-aware model of their behavior. Unlike contemporary approaches, IDIL is capable of addressing sequential tasks with high-dimensional state representations, while sidestepping the complexities and drawbacks associated with adversarial training (a mainstay of related techniques). Our empirical results suggest that the models generated by IDIL either match or surpass those produced by recent imitation learning benchmarks in metrics of task performance. Moreover, as it creates a generative model, IDIL demonstrates superior performance in intent inference metrics, crucial for human-agent interactions, and aptly captures a broad spectrum of expert behaviors.
- Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.
- Stefano V Albrecht and Peter Stone. 2018. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence 258 (2018), 66–95.
- Oleg Arenz and Gerhard Neumann. 2020. Non-adversarial imitation learning and its connections to adversarial methods. arXiv preprint arXiv:2008.03525 (2020).
- A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5 (2009), 469–483.
- Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence 297 (2021), 103500.
- Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning. PMLR, 1732–1748.
- Survey of psychophysiology measurements applied to human-robot interaction. In RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 732–737.
- Kenneth Bogert and Prashant Doshi. 2022. A hierarchical bayesian process for inverse rl in partially-controlled environments. In AAMAS Conference proceedings.
- Richard W Byrne and Anne E Russon. 1998. Learning by imitation: A hierarchical approach. Behavioral and brain sciences 21, 5 (1998), 667–684.
- Option-aware adversarial inverse reinforcement learning for robotic control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5902–5908.
- Jaedeug Choi and Kee-Eung Kim. 2012. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. Advances in neural information processing systems 25 (2012).
- Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems 34 (2021), 4028–4039.
- A divergence minimization perspective on imitation learning methods. In Conference on Robot Learning. PMLR, 1259–1277.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.
- Multi-modal imitation learning from unstructured demonstrations using generative adversarial nets. Advances in neural information processing systems 30 (2017).
- A survey of workload assessment algorithms. IEEE Transactions on Human-Machine Systems 48, 5 (2018), 434–451.
- Human modeling for human–robot collaboration. The International Journal of Robotics Research 36, 5-7 (2017), 580–596.
- Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems 29 (2016).
- Guy Hoffman. 2019. Evaluating fluency in human–robot collaboration. IEEE Transactions on Human-Machine Systems 49, 3 (2019), 209–218.
- Generative modeling of multimodal multi-human behavior. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3088–3095.
- Abhinav Jain and Vaibhav Unhelkar. 2024. GO-DICE: Goal-conditioned Option-aware Offline Imitation Learning via Stationary Distribution Correction Estimation. In Proceedings of the AAAI conference on artificial intelligence.
- Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations.
- On computation and generalization of generative adversarial networks under spectrum control. In International Conference on Learning Representations.
- Adversarial option-aware hierarchical imitation learning. In International Conference on Machine Learning. PMLR, 5097–5106.
- Compile: Compositional imitation learning and execution. In International Conference on Machine Learning. PMLR, 3418–3428.
- A survey of methods for safe human-robot interaction. Foundations and Trends® in Robotics 5, 4 (2017), 261–349.
- Hierarchical imitation and reinforcement learning. In International conference on machine learning. PMLR, 2917–2926.
- Sang-Hyun Lee and Seung-Woo Seo. 2020. Learning compound tasks without task-specific knowledge via imitation and self-supervised learning. In International Conference on Machine Learning. PMLR, 5747–5756.
- Infogail: Interpretable imitation learning from visual demonstrations. Advances in neural information processing systems 30 (2017).
- Rethinking ValueDice - Does It Really Improve Performance?. In ICLR Blog Track. https://iclr-blog-track.github.io/2022/03/25/rethinking-valuedice/
- Multimodal Physiological and Behavioral Measures to Estimate Human States and Decisions for Improved Human Autonomy Teaming. Technical Report. CCDC Army Research Laboratory Aberdeen Proving Ground United States.
- Factorial Agent Markov Model: Modeling Other Agents’ Behavior in presence of Dynamic Latent Decision Factors. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). IFAAMAS.
- RW4T Dataset: Data of Human-Robot Behavior and Cognitive States in Simulated Disaster Response Tasks. In ACM/IEEE International Conference on Human-Robot Interaction (HRI).
- An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics 7, 1-2 (2018), 1–179.
- Dean A Pomerleau. 1988. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems 1 (1988).
- Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
- Peizhu Qian and Vaibhav V Unhelkar. 2022. Evaluating the Role of Interactivity on Improving Transparency in Autonomous Agents. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). IFAAMAS.
- Measuring Variations in Workload during Human-Robot Collaboration through Automated After-Action Reviews. In Companion of the ACM/IEEE International Conference on Human-Robot Interaction (HRI).
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 627–635.
- Multimodal probabilistic model-based planning for human-robot interaction. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3399–3406.
- Automated Task-Time Interventions to Improve Teamwork using Imitation Learning. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). IFAAMAS.
- Sangwon Seo and Vaibhav Unhelkar. 2022. Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations. In International Joint Conference on Artificial Intelligence (IJCAI).
- Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information. In International Conference on Learning Representations.
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1-2 (1999), 181–211.
- Computational human-robot interaction. Foundations and Trends® in Robotics 4, 2-3 (2016), 105–223.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026–5033.
- On the sample complexity of stability constrained imitation learning. In Learning for Dynamics and Control Conference. PMLR, 180–191.
- Decision-Making for Bidirectional Communication in Sequential Human-Robot Collaborative Tasks. In ACM/IEEE International Conference on Human-Robot Interaction (HRI).
- Semi-supervised learning of decision-making models for human-robot collaboration. In Conference on Robot Learning. PMLR, 192–203.
- Vaibhav V Unhelkar and Julie A Shah. 2019. Learning models of sequential decision-making with partial specification of agent behavior. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 2522–2530.
- Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory 13, 2 (1967), 260–269.
- Co-gail: Learning diverse strategies for human-robot collaboration. In Conference on Robot Learning. PMLR, 1279–1290.
- Learning latent representations to influence multi-agent interaction. In Conference on robot learning. PMLR, 575–588.
- Evaluating Effects of User Experience and System Transparency on Trust in Automation. In ACM/IEEE International Conference on Human-Robot Interaction (HRI) (Vienna, Austria). 408–416.
- Shangtong Zhang and Shimon Whiteson. 2019. DAC: The double actor-critic architecture for learning options. Advances in Neural Information Processing Systems 32 (2019).
- Zhiyu Zhang and Ioannis Paschalidis. 2021. Provable hierarchical imitation learning via em. In International Conference on Artificial Intelligence and Statistics. PMLR, 883–891.
- Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd national conference on Artificial intelligence-Volume 3. 1433–1438.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.