Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Benefit of Optimal Transport for Curriculum Reinforcement Learning (2309.14091v2)

Published 25 Sep 2023 in cs.LG

Abstract: Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in various tasks with different characteristics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
  2. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
  3. I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
  4. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning, 2022.
  5. S. Liu, G. Lever, Z. Wang, J. Merel, S. Eslami, D. Hennes, W. M. Czarnecki, Y. Tassa, S. Omidshafiei, A. Abdolmaleki et al., “From motor control to team play in simulated humanoid football,” arXiv preprint arXiv:2105.12196, 2021.
  6. M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying count-based exploration and intrinsic motivation,” in Neural Information Processing Systems (NeurIPS), 2016.
  7. M. Ghavamzadeh, S. Mannor, J. Pineau, and A. Tamar, “Bayesian reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 8, no. 5-6, pp. 359–483, 2015.
  8. M. C. Machado, M. G. Bellemare, and M. Bowling, “Count-based exploration with the successor representation,” in AAAI Conference on Artificial Intelligence (AAAI), 2020.
  9. S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” Journal of Machine Learning Research (JMLR), vol. 21, no. 181, pp. 1–50, 2020.
  10. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, “Hindsight experience replay,” in Neural Information Processing Systems (NeurIPS), 2017.
  11. C. Florensa, D. Held, M. Wulfmeier, M. Zhang, and P. Abbeel, “Reverse curriculum generation for reinforcement learning,” in Conference on Robot Learning (CoRL), 2017.
  12. R. Wang, J. Lehman, J. Clune, and K. O. Stanley, “Poet: open-ended coevolution of environments and their optimized solutions,” in Genetic and Evolutionary Computation Conference (GECCO), 2019, pp. 142–151.
  13. R. Portelas, C. Colas, K. Hofmann, and P.-Y. Oudeyer, “Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments,” in Conference on Robot Learning (CoRL), 2019.
  14. J. Wöhlke, F. Schmitt, and H. van Hoof, “A performance-based start state curriculum framework for reinforcement learning,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020, pp. 1503–1511.
  15. M. Jiang, M. Dennis, J. Parker-Holder, J. Foerster, E. Grefenstette, and T. Rocktäschel, “Replay-guided adversarial environment design,” in Neural Information Processing Systems (NeurIPS), 2021.
  16. P. Klink, H. Abdulsamad, B. Belousov, C. D’Eramo, J. Peters, and J. Pajarinen, “A probabilistic interpretation of self-paced learning with applications to reinforcement learning,” Journal of Machine Learning Research (JMLR), vol. 22, no. 182, pp. 1–52, 2021.
  17. P. Klink, H. Abdulsamad, B. Belousov, and J. Peters, “Self-paced contextual reinforcement learning,” in Conference on Robot Learning (CoRL), 2020.
  18. P. Klink, C. D’ Eramo, J. R. Peters, and J. Pajarinen, “Self-paced deep reinforcement learning,” in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., 2020.
  19. J. Chen, Y. Zhang, Y. Xu, H. Ma, H. Yang, J. Song, Y. Wang, and Y. Wu, “Variational automatic curriculum learning for sparse-reward cooperative multi-agent problems,” Neural Information Processing Systems (NeurIPS), 2021.
  20. C. Romac, R. Portelas, K. Hofmann, and P.-Y. Oudeyer, “Teachmyagent: a benchmark for automatic curriculum learning in deep rl,” International Conference on Machine Learning (ICML), 2021.
  21. P. Huang, M. Xu, J. Zhu, L. Shi, F. Fang, and D. Zhao, “Curriculum reinforcement learning using optimal transport via gradual domain adaptation,” in Neural Information Processing Systems (NeurIPS), 2022.
  22. D. Weinshall and D. Amir, “Theory of curriculum learning, with convex loss functions,” Journal of Machine Learning Research (JMLR), vol. 21, no. 222, pp. 1–19, 2020.
  23. X. Wu, E. Dyer, and B. Neyshabur, “When do curricula work?” in International Conference on Learning Representations (ICLR), 2021.
  24. Q. Li, Y. Zhai, Y. Ma, and S. Levine, “Understanding the complexity gains of single-task rl with a curriculum,” in International Conference on Machine Learning (ICML), 2023.
  25. A. Allievi, P. Stone, S. Niekum, S. Booth, and W. B. Knox, “The perils of trial-and-error reward design: Misdesign through overfitting and invalid task specifications,” in AAAI Conference on Artificial Intelligence (AAAI), 2023.
  26. M. Dennis, N. Jaques, E. Vinitsky, A. Bayen, S. Russell, A. Critch, and S. Levine, “Emergent complexity and zero-shot transfer via unsupervised environment design,” in Neural Information Processing Systems (NeurIPS), 2020.
  27. M. Jiang, E. Grefenstette, and T. Rocktäschel, “Prioritized level replay,” in International Conference on Machine Learning (ICML), 2021.
  28. S. Sukhbaatar, Z. Lin, I. Kostrikov, G. Synnaeve, A. Szlam, and R. Fergus, “Intrinsic motivation and automatic curricula via asymmetric self-play,” in International Conference on Learning Representations (ICLR), 2018.
  29. C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic goal generation for reinforcement learning agents,” in International Conference on Machine Learning (ICML), 2018.
  30. Y. Zhang, P. Abbeel, and L. Pinto, “Automatic curriculum learning through value disagreement,” in Neural Information Processing Systems (NeurIPS), 2020.
  31. S. Racaniere, A. K. Lampinen, A. Santoro, D. P. Reichert, V. Firoiu, and T. P. Lillicrap, “Automated curricula through setter-solver interactions,” in International Conference on Learning Representations (ICLR), 2020.
  32. T. Eimer, A. Biedenkapp, F. Hutter, and M. Lindauer, “Self-paced context evaluation for contextual reinforcement learning,” in International Conference on Machine Learning (ICML), 2021.
  33. A. Baranes and P.-Y. Oudeyer, “Intrinsically motivated goal exploration for active motor learning in robots: A case study,” in International Conference on Intelligent Robots and Systems (IROS), 2010.
  34. M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Neural Information Processing Systems (NeurIPS), 2010.
  35. R. M. Neal, “Annealed importance sampling,” Statistics and Computing, vol. 11, no. 2, pp. 125–139, 2001.
  36. G. Peyré, M. Cuturi et al., “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
  37. Y. Chen, T. T. Georgiou, and M. Pavon, “Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrödinger bridge,” SIAM Review (SIREV), vol. 63, no. 2, pp. 249–313, 2021.
  38. L. Kantorovich, “On the transfer of masses (in russian),” Doklady Akademii Nauk, vol. 37, no. 2, pp. 227–229, 1942.
  39. C. Liu, J. Zhuo, P. Cheng, R. Zhang, J. Zhu, and L. Carin, “Understanding and accelerating particle-based variational inference,” in International Conference on Machine Learning (ICML), 2019.
  40. S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde, “Optimal mass transport: Signal processing and machine-learning applications,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 43–59, 2017.
  41. K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. P. Xing, “Neural architecture search with bayesian optimisation and optimal transport,” in Neural information processing systems (NeurIPS), 2018.
  42. M. Togninalli, E. Ghisu, F. Llinares-López, B. Rieck, and K. Borgwardt, “Wasserstein weisfeiler-lehman graph kernels,” in Neural Information Processing Systems (NeurIPS), 2019.
  43. F. Mémoli, “Gromov–wasserstein distances and the metric approach to object matching,” Foundations of computational mathematics, vol. 11, no. 4, pp. 417–487, 2011.
  44. C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, and N. Courty, “Semi-relaxed gromov-wasserstein divergence and applications on graphs,” in International Conference on Learning Representations (ICLR), 2022.
  45. P. Demetci, R. Santorella, B. Sandstede, W. S. Noble, and R. Singh, “Gromov-wasserstein optimal transport to align single-cell multi-omics data,” in ICML 2020 Workshop on Computational Biology, 2020.
  46. A. Fickinger, S. Cohen, S. Russell, and B. Amos, “Cross-domain imitation learning via optimal transport,” in International Conference on Learning Representations (ICLR), 2022.
  47. R. Zhang, C. Chen, C. Li, and L. Carin, “Policy optimization as wasserstein gradient flows,” in International Conference on Machine Learning (ICML), 2018.
  48. A. M. Metelli, A. Likmeta, and M. Restelli, “Propagating uncertainty in reinforcement learning via wasserstein barycenters,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  49. L. Chen, K. Bai, C. Tao, Y. Zhang, G. Wang, W. Wang, R. Henao, and L. Carin, “Sequence generation with optimal-transport-enhanced reinforcement learning,” in AAAI Conference on Artificial Intelligence (AAAI), 2020.
  50. Y. L. Goh, W. S. Lee, X. Bresson, T. Laurent, and N. Lim, “Combining reinforcement learning and optimal transport for the traveling salesman problem,” in 1st International Workshop on Optimal Transport and Structured Data Modeling, 2022.
  51. Z. Ren, K. Dong, Y. Zhou, Q. Liu, and J. Peng, “Exploration via hindsight goal generation,” Neural Information Processing Systems (NeurIPS), 2019.
  52. I. Durugkar, M. Tec, S. Niekum, and P. Stone, “Adversarial intrinsic motivation for reinforcement learning,” 2021.
  53. D. Cho, S. Lee, and H. J. Kim, “Outcome-directed reinforcement learning by uncertainty & temporal distance-aware curriculum goal generation,” International Conference on Learning Representations (ICLR), 2023.
  54. P. Klink, H. Yang, C. D’Eramo, J. Peters, and J. Pajarinen, “Curriculum reinforcement learning via constrained optimal transport,” in International Conference on Machine Learning.   PMLR, 2022, pp. 11 341–11 358.
  55. A. Hallak, D. Di Castro, and S. Mannor, “Contextual markov decision processes,” arXiv preprint arXiv:1502.02259, 2015.
  56. A. Wilson, A. Fern, S. Ray, and P. Tadepalli, “Multi-task reinforcement learning: a hierarchical bayesian approach,” in International Conference on Machine Learning (ICML), 2007.
  57. T. Schaul, D. Horgan, K. Gregor, and D. Silver, “Universal value function approximators,” in International Conference on Machine Learning (ICML), 2015.
  58. G. Monge, “Mémoire sur la théorie des déblais et des remblais,” De l’Imprimerie Royale, 1781.
  59. M. Agueh and G. Carlier, “Barycenters in the wasserstein space,” SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011.
  60. D. Meng, Q. Zhao, and L. Jiang, “A theoretical understanding of self-paced learning,” Information Sciences, vol. 414, pp. 319–328, 2017.
  61. J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouve, and G. Peyré, “Interpolating between optimal transport and mmd using sinkhorn divergences,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
  62. E. A. Nadaraya, “On estimating regression,” Theory of Probability & Its Applications, vol. 9, no. 1, pp. 141–142, 1964.
  63. G. S. Watson, “Smooth regression analysis,” Sankhyā: The Indian Journal of Statistics, Series A, pp. 359–372, 1964.
  64. A. Graves, M. G. Bellemare, J. Menick, R. Munos, and K. Kavukcuoglu, “Automated curriculum learning for neural networks,” in International Conference on Machine Learning (ICML), 2017.
  65. M. Chevalier-Boisvert, L. Willems, and S. Pal, “Minimalistic gridworld environment for gymnasium,” 2018. [Online]. Available: https://github.com/Farama-Foundation/Minigrid
  66. E. Baikousi, G. Rogkakos, and P. Vassiliadis, “Similarity measures for multidimensional data,” in International Conference on Data Engineering (ICDE), 2011.
  67. R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for dense and sparse linear assignment problems,” Computing, vol. 38, no. 4, pp. 325–340, 1987.
  68. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
  69. J. Feydy and P. Roussillon, “Geomloss,” 2019. [Online]. Available: https://www.kernel-operations.io/geomloss/index.html
  70. N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, “Sliced and radon wasserstein barycenters of measures,” Journal of Mathematical Imaging and Vision, vol. 51, no. 1, pp. 22–45, 2015.
  71. S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde, “Generalized sliced wasserstein distances,” Neural Information Processing Systems (NeurIPS), 2019.
  72. N. Courty, R. Flamary, and M. Ducoffe, “Learning wasserstein embeddings,” in International Conference on Learning Representations (ICLR), 2018.
  73. L. Li, A. Genevay, M. Yurochkin, and J. M. Solomon, “Continuous regularized wasserstein barycenters,” Neural Information Processing Systems (NeurIPS), 2020.
  74. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-1364.html
  75. J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, “Iterative bregman projections for regularized transportation problems,” SIAM Journal on Scientific Computing, vol. 37, no. 2, pp. A1111–A1138, 2015.
  76. M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” Neural Information Processing Systems (NeurIPS), 2013.
  77. S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pascal Klink (12 papers)
  2. Carlo D'Eramo (28 papers)
  3. Jan Peters (253 papers)
  4. Joni Pajarinen (68 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.