Active Third-Person Imitation Learning (2312.16365v1)
Abstract: We consider the problem of third-person imitation learning with the additional challenge that the learner must select the perspective from which they observe the expert. In our setting, each perspective provides only limited information about the expert's behavior, and the learning agent must carefully select and combine information from different perspectives to achieve competitive performance. This setting is inspired by real-world imitation learning applications, e.g., in robotics, a robot might observe a human demonstrator via camera and receive information from different perspectives depending on the camera's position. We formalize the aforementioned active third-person imitation learning problem, theoretically analyze its characteristics, and propose a generative adversarial network-based active learning approach. Empirically, we demstrate that our proposed approach can effectively learn from expert demonstrations and explore the importance of different architectural choices for the learner's performance.
- Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning (ICML).
- Deep Reinforcement Learning at the Edge of the Statistical Precipice. ([n. d.]), 27.
- Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence 297 (2021), 103500.
- Humanoid robot learning and game playing using PC-based vision. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 3. IEEE, 2449–2454.
- Lukas Biewald. 2020. Experiment Tracking with Weights and Biases.
- Active Imitation Learning with Noisy Guidance. In Annual Meeting of the Association for Computational Linguistics. 2093–2105.
- Openai gym. arXiv preprint arXiv:1606.01540 (2016).
- Interactive Imitation Learning in Robotics: A Survey. Foundations and Trends® in Robotics 10, 1-2 (2022), 1–197. http://dx.doi.org/10.1561/2300000072
- Causal confusion in imitation learning. Advances in Neural Information Processing Systems (NeurIPS) (2019).
- Norman Di Palo and Edward Johns. 2020. SAFARI: Safe and active robot imitation learning with imagination. arXiv preprint arXiv:2011.09586 (2020).
- Power Grid Congestion Management via Topology Optimization with AlphaZero. CoRR abs/2211.05612 (2022). https://doi.org/10.48550/arXiv.2211.05612 arXiv:2211.05612
- Wael Farag and Zakaria Saleh. 2018. Behavior Cloning for Autonomous Driving using Convolutional Neural Networks. In 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). 1–7. https://doi.org/10.1109/3ICT.2018.8855753
- Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks. In 2022 IEEE International Conference on Development and Learning (ICDL). IEEE, London, United Kingdom, 121–126. https://doi.org/10.1109/ICDL53763.2022.9962214
- Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NeurIPS).
- HyperNetworks. arXiv:1609.09106 [cs] (Dec. 2016). arXiv:1609.09106 [cs]
- Array Programming with NumPy. Nature 585, 7825 (Sept. 2020), 357–362. https://doi.org/10.1038/s41586-020-2649-2
- Teaching inverse reinforcement learners via features and demonstrations. Advances in Neural Information Processing Systems (NeurIPS) 31 (2018).
- Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning. In Conference on Artificial Intelligence (AAAI), Vol. 32.
- Jonathan Ho and Stefano Ermon. 2016. Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems (NeurIPS), D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.).
- Deep imitation learning for 3D navigation tasks. Neural computing and applications 29 (2018), 389–404.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR) 50, 2 (2017), 1–35.
- Movement imitation with nonlinear dynamical systems in humanoid robots. In International Conference on Robotics and Automation (ICRA). IEEE, 1398–1403.
- Active imitation learning via reduction to iid active learning. In 2012 AAAI Fall Symposium Series.
- Interactive Teaching Algorithms for Inverse Reinforcement Learning. In International Joint Conference on Artificial Intelligence (IJCAI). 2692–2700.
- VOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation. In International Conference on Robotics and Automation (ICRA). 2497–2503.
- Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
- Active Exploration for Inverse Reinforcement Learning. arXiv preprint arXiv:2207.08645 (2022).
- Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201 (2017).
- Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).
- Combining self-supervised learning and imitation for vision-based rope manipulation. In International Conference on Robotics and Automation (ICRA). IEEE, 2146–2153.
- Algorithms for inverse reinforcement learning.. In International Conference on Machine Learning (ICML).
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. ([n. d.]), 12.
- Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2050–2053.
- FiLM: Visual Reasoning with a General Conditioning Layer. arXiv:1709.07871 [cs, stat]
- Plotly-Community. [n.d.]. Plotly/Plotly.Py: The Interactive Graphing Library for Python.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
- Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In International Conference on Artificial Intelligence and Statistics (AISTATS). 661–668.
- Stuart Russell. 1998. Learning agents for uncertain environments. In Conference on Computational Learning Theory. 101–103.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.
- Trust Region Policy Optimization. In International Conference on Machine Learning (ICML). 1889–1897.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Time-Contrastive Networks: Self-Supervised Learning from Video. arXiv:1704.06888 [cs]
- Jinghuan Shang and Michael S. Ryoo. 2021. Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning. arXiv:2108.01069 [cs]
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017).
- Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review 55, 2 (2022), 945–990.
- Third Person Imitation Learning. In International Conference on Learning Representations (ICLR).
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence (IJCAI). 4950–4957.
- Generative Adversarial Imitation from Observation. arXiv:1807.06158 [cs, stat]
- Recent Advances in Imitation Learning from Observation. In International Joint Conference on Artificial Intelligence (IJCAI) (Macao, China, 2019-08). 6325–6331.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888 (2015).
- XIRL: Cross-embodiment Inverse Reinforcement Learning. ([n. d.]).
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In AAAI.
- Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In Symposium Series on Computational Intelligence (SSCI). IEEE, 737–744.