Adversarial Imitation Learning from Visual Observations using Latent Information (2309.17371v3)
Abstract: We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source. The challenges of this framework include the absence of expert actions and the partial observability of the environment, as the ground-truth states can only be inferred from pixels. To tackle this problem, we first conduct a theoretical analysis of imitation learning in partially observable environments. We establish upper bounds on the suboptimality of the learning agent with respect to the divergence between the expert and the agent latent state-transition distributions. Motivated by this analysis, we introduce an algorithm called Latent Adversarial Imitation from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations. In experiments on high-dimensional continuous robotic tasks, we show that our model-free approach in latent space matches state-of-the-art performance. Additionally, we show how our method can be used to improve the efficiency of reinforcement learning from pixels by leveraging expert videos. To ensure reproducibility, we provide free access to our code.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first International Conference on Machine Learning, pp. 1, 2004.
- Karl J. Astrom. Optimal control of markov decision processes with incomplete state estimation. J. Math. Anal. Applic., 10:174–205, 1965.
- Robot learning from demonstration. In Proceedings of the fourteenth International Conference on Machine Learning, pp. 12–20, 1997.
- Learning from demonstration in the wild. In 2019 International Conference on Robotics and Automation (ICRA), pp. 775–781. IEEE, 2019.
- Sample-efficient imitation learning via generative adversarial nets. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3138–3148. PMLR, 2019.
- Lipschitzness is all you need to tame off-policy generative adversarial imitation learning. Machine Learning, 111(4):1431–1521, 2022.
- Domain-robust visual imitation learning with mutual information constraints. arXiv preprint arXiv:2103.05079, 2021.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- On the guaranteed almost equivalence between imitation learning from observation and demonstration. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Primal wasserstein imitation learning. arXiv preprint arXiv:2006.04678, 2020.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
- Learning belief representations for imitation learning in POMDPs. In Uncertainty in Artificial Intelligence, pp. 1061–1071. PMLR, 2020.
- Imitation learning from observations under transition model disparity. arXiv preprint arXiv:2204.11446, 2022.
- A divergence minimization perspective on imitation learning methods. In Proceedings of the Conference on Robot Learning, pp. 1259–1277. PMLR, 2020.
- Combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task. arXiv preprint arXiv:2203.06250, 2022.
- Opportunities and challenges from using animal videos in reinforcement learning for navigation. IFAC-PapersOnLine, 56(2):9056–9061, 2023. 22nd IFAC World Congress.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Improved training of wasserstein GANs. Advances in Neural Information Processing Systems, 30, 2017.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. In Proceedings of the thirty-sixth International Conference on Machine Learning, pp. 2555–2565. PMLR, 2019b.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Generative adversarial imitation learning. Advances in Neural Information Processing Systems, 29, 2016.
- Model-based imitation learning for urban driving. arXiv preprint arXiv:2210.07729, 2022.
- Mobile: Model-based imitation learning from observation alone. Advances in Neural Information Processing Systems, 34:28598–28611, 2021.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. arXiv preprint arXiv:1809.02925, 2018.
- Imitation learning via off-policy distribution matching. arXiv preprint arXiv:1912.05032, 2019.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- CURL: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the thirty-seventh International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020a.
- Reinforcement learning with augmented data. Advances in Neural Information Processing Systems, 33:19884–19895, 2020b.
- Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33:741–752, 2020.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Visual imitation learning with patch rewards. arXiv preprint arXiv:2302.00965, 2023.
- Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125. IEEE, 2018.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Algorithms for inverse reinforcement learning. In Proceedings of the seventeenth International Conference on Machine Learning, volume 1, pp. 2, 2000.
- Domain-adversarial and-conditional state space model for imitation learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5179–5186. IEEE, 2020.
- Information theory: From coding to learning, 2022.
- Generalized proximal policy optimization with sample reuse. Advances in Neural Information Processing Systems, 34:11909–11919, 2021.
- Generalized policy improvement algorithms with theoretically supported sample reuse. arXiv preprint arXiv:2206.13714, 2022.
- Visual adversarial imitation learning using variational models. Advances in Neural Information Processing Systems, 34:3016–3028, 2021.
- Automatic data augmentation for generalization in reinforcement learning. Advances in Neural Information Processing Systems, 34:5402–5415, 2021.
- Imitation learning from mpc for quadrupedal multi-gait control. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5014–5020. IEEE, 2021.
- Efficient reductions for imitation learning. In Proceedings of the thirteenth International Conference on Artificial Intelligence and Statistics, pp. 661–668. JMLR Workshop and Conference Proceedings, 2010.
- Stuart Russell. Learning agents for uncertain environments. In Proceedings of the eleventh Annual Conference on Computational Learning Theory, pp. 101–103, 1998.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Videodex: Learning dexterity from internet videos. In Proceedings of The sixth Conference on Robot Learning, pp. 654–665. PMLR, 2023.
- Third-person imitation learning. arXiv preprint arXiv:1703.01703, 2017.
- A game-theoretic approach to apprenticeship learning. Advances in Neural Information Processing Systems, 20, 2007.
- Apprenticeship learning using linear programming. In Proceedings of the twenty-fifth International Conference on Machine Learning, pp. 1032–1039, 2008.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Behavioral cloning from observation. arXiv preprint arXiv:1805.01954, 2018a.
- Generative adversarial imitation from observation. arXiv preprint arXiv:1807.06158, 2018b.
- Imitation learning from observations by minimizing inverse dynamics disagreement. Advances in Neural Information Processing Systems, 32, 2019.
- Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Learning by watching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12711–12721, 2021.
- Selfd: self-learning large-scale driving policies from the web. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17316–17326, 2022.
- Off-policy imitation learning from observations. Advances in Neural Information Processing Systems, 33:12402–12413, 2020.
- Maximum entropy inverse reinforcement learning. In Twenty-Third AAAI Conference on Artificial Intelligence, volume 8, pp. 1433–1438. Chicago, IL, USA, 2008.
- Vittorio Giammarino (11 papers)
- James Queeney (12 papers)
- Ioannis Ch. Paschalidis (66 papers)