Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Align Your Intents: Offline Imitation Learning via Optimal Transport (2402.13037v2)

Published 20 Feb 2024 in cs.LG and cs.AI

Abstract: Offline Reinforcement Learning (RL) addresses the problem of sequential decision-making by learning optimal policy through pre-collected data, without interacting with the environment. As yet, it has remained somewhat impractical, because one rarely knows the reward explicitly and it is hard to distill it retrospectively. Here, we show that an imitating agent can still learn the desired behavior merely from observing the expert, despite the absence of explicit rewards or action labels. In our method, AILOT (Aligned Imitation Learning via Optimal Transport), we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data. Given such representations, we define intrinsic reward function via optimal transport distance between the expert's and the agent's trajectories. We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks and improves the performance of other offline RL algorithms by dense reward relabelling in the sparse-reward tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Brulé: Barycenter-regularized unsupervised landmark extraction. Pattern Recognit., 131:108816, 2020.
  2. Lambo: Landmarks augmentation with manifold-barycentric oversampling. IEEE Access, 10:117757–117769, 2022.
  3. JAX: composable transformations of Python+NumPy programs, 2018.
  4. Deep bayesian reward learning from preferences. arXiv preprint arXiv:1912.04472, 2019.
  5. Optimal transport tools (ott): A jax toolbox for all things wasserstein. arXiv preprint arXiv:2201.12324, 2022.
  6. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  7. Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4):613–624, 1993.
  8. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
  9. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
  10. Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34:4028–4039, 2021.
  11. Reinforcement learning from passive data via latent intentions. In International Conference on Machine Learning, pages 11321–11339. PMLR, 2023.
  12. Generative adversarial imitation learning, 2016.
  13. Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems, 31, 2018.
  14. Equinox: neural networks in JAX via callable PyTrees and filtered transformations. Differentiable Programming workshop at Neural Information Processing Systems 2021, 2021.
  15. Demodice: Offline imitation learning with supplementary imperfect demonstrations. 01 2022.
  16. Neural optimal transport. arXiv preprint arXiv:2201.12220, 2022.
  17. Offline reinforcement learning with implicit q-learning. 2021.
  18. Reinforcement learning for suppression of collective activity in oscillatory ensembles. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(3), March 2020.
  19. Should i run offline reinforcement learning or behavioral cloning? In International Conference on Learning Representations, 2021.
  20. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
  21. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  22. Accelerating exploration with unlabeled prior data. arXiv preprint arXiv:2311.05067, 2023.
  23. Clue: Calibrated latent guidance for offline reinforcement learning, 2023.
  24. Optimal transport for offline imitation learning. arXiv preprint arXiv:2303.13971, 2023.
  25. Versatile offline imitation from observations and examples via regularized state-occupancy matching. In International Conference on Machine Learning, pages 14639–14663. PMLR, 2022.
  26. Versatile offline imitation from observations and examples via regularized state-occupancy matching, 2022.
  27. Hiql: Offline goal-conditioned rl with latent states as actions. arXiv preprint arXiv:2307.11949, 2023.
  28. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  29. Dean A. Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
  30. Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108, 2019.
  31. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 661–668. JMLR Workshop and Conference Proceedings, 2010.
  32. Generative modeling with optimal transport maps. arXiv preprint arXiv:2110.02999, 2021.
  33. Reinforcement learning based disease progression model for alzheimer’s disease. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 20903–20915. Curran Associates, Inc., 2021.
  34. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
  35. Cog: Connecting new skills to past experience with offline reinforcement learning. arXiv preprint arXiv:2010.14500, 2020.
  36. S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Conference on Robot Learning, pages 907–917. PMLR, 2022.
  37. Goplan: Goal-conditioned offline reinforcement learning by planning with learned models. arXiv preprint arXiv:2310.20025, 2023.
  38. Intrinsic reward driven imitation learning via generative model. In International conference on machine learning, pages 10925–10935. PMLR, 2020.
  39. How to leverage unlabeled data in offline reinforcement learning. In International Conference on Machine Learning, pages 25611–25635. PMLR, 2022.
  40. Contrastive difference predictive coding. arXiv preprint arXiv:2310.20141, 2023.
  41. Offline learning from demonstrations and unlabeled experience. arXiv preprint arXiv:2011.13885, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Maksim Bobrin (3 papers)
  2. Nazar Buzun (11 papers)
  3. Dmitrii Krylov (4 papers)
  4. Dmitry V. Dylov (34 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets