Papers
Topics
Authors
Recent
Search
2000 character limit reached

PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

Published 11 Jun 2023 in cs.LG | (2306.06637v2)

Abstract: In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing REINFORCE-based stochastic policy gradient. As a result, PACER fully utilizes the modeling capability of the push-forward operator and is able to explore a broader class of the policy space, compared with limited policy classes used in existing distributional actor critic algorithms (i.e. Gaussians). We validate the critical role of each component in our algorithm with extensive empirical studies. Experimental results demonstrate the superiority of our algorithm over the state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Amari SI (1998) Natural gradient works efficiently in learning. Neural computation 10(2):251–276 Armengol Urpí et al [2021] Armengol Urpí N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Armengol Urpí N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  2. Armengol Urpí N, Curi S, Krause A (2021) Risk-averse offline reinforcement learning. In: ICLR 2021, OpenReview Balbás et al [2009] Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  3. Balbás A, Garrido J, Mayoral S (2009) Properties of distortion risk measures. Methodology and Computing in Applied Probability 11(3):385–399 Baptista et al [2023] Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  4. Baptista R, Hosseini B, Kovachki NB, et al (2023) An approximation theory framework for measure-transport sampling algorithms. arXiv preprint arXiv:230213965 Barth-Maron et al [2018] Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  5. Barth-Maron G, Hoffman MW, Budden D, et al (2018) Distributed distributional deterministic policy gradients. In: ICLR 2018 Bellemare et al [2017] Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  6. Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 2017, PMLR, pp 449–458 Bellemare et al [2023] Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  7. Bellemare MG, Dabney W, Rowland M (2023) Distributional Reinforcement Learning. MIT Press, http://www.distributional-rl.org Burda et al [2019] Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  8. Burda Y, Edwards H, Storkey A, et al (2019) Exploration by random network distillation. In: Seventh International Conference on Learning Representations, pp 1–17 Choi et al [2021] Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  9. Choi J, Dance C, Kim Je, et al (2021) Risk-conditioned distributional soft actor-critic for risk-sensitive navigation. In: ICRA 2021, IEEE, pp 8337–8344 Chow et al [2015] Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  10. Chow Y, Tamar A, Mannor S, et al (2015) Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems 28 Chow et al [2017] Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  11. Chow Y, Ghavamzadeh M, Janson L, et al (2017) Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res 18(1):6070–6120 Creswell et al [2018] Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  12. Creswell A, White T, Dumoulin V, et al (2018) Generative adversarial networks: An overview. IEEE signal processing magazine 35(1):53–65 Dabney et al [2018a] Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  13. Dabney W, Ostrovski G, Silver D, et al (2018a) Implicit quantile networks for distributional reinforcement learning. In: ICML 2018, PMLR, pp 1096–1105 Dabney et al [2018b] Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  14. Dabney W, Rowland M, Bellemare M, et al (2018b) Distributional reinforcement learning with quantile regression. In: AAAI 2018 Duan et al [2021] Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  15. Duan J, Guan Y, Li SE, et al (2021) Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems Engel et al [2005] Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  16. Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on Machine learning, pp 201–208 Fujimoto et al [2018] Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  17. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: ICML 2018, PMLR, pp 1587–1596 Goodfellow et al [2020] Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  18. Goodfellow I, Pouget-Abadie J, Mirza M, et al (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144 Haarnoja et al [2017] Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  19. Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: ICML 2017, PMLR, pp 1352–1361 Haarnoja et al [2018] Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  20. Haarnoja T, Zhou A, Hartikainen K, et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:181205905 Heess et al [2015] Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  21. Heess N, Wayne G, Silver D, et al (2015) Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems 28 Kingma and Welling [2013] Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  22. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114 Kingma et al [2014] Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  23. Kingma DP, Mohamed S, Jimenez Rezende D, et al (2014) Semi-supervised learning with deep generative models. NIPS 2014 27 Konda and Tsitsiklis [1999] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  24. Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Advances in neural information processing systems 12 Kuznetsov et al [2020] Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  25. Kuznetsov A, Shvechikov P, Grishin A, et al (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International Conference on Machine Learning, PMLR, pp 5556–5566 Lillicrap et al [2015] Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  26. Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lindenberg et al [2022] Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  27. Lindenberg B, Nordqvist J, Lindahl KO (2022) Conjugated discrete distributions for distributional reinforcement learning. In: AAAI 2022, pp 7516–7524 Ma et al [2020] Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  28. Ma X, Xia L, Zhou Z, et al (2020) Dsac: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:200414547 Marzouk et al [2016] Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  29. Marzouk Y, Moselhy T, Parno M, et al (2016) Sampling via measure transport: An introduction. Handbook of uncertainty quantification 1:2 Mnih et al [2015] Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  30. Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533 MORIMURA [2010] MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  31. MORIMURA T (2010) Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence, 2010 Nam et al [2021] Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  32. Nam DW, Kim Y, Park CY (2021) Gmac: A distributional perspective on actor-critic framework. In: ICML 2021, PMLR, pp 7927–7936 Ororbia and Kifer [2022] Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  33. Ororbia A, Kifer D (2022) The neural coding framework for learning generative models. Nature communications 13(1):2064 Parno and Marzouk [2018] Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  34. Parno MD, Marzouk YM (2018) Transport map accelerated markov chain monte carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2):645–682 Peherstorfer and Marzouk [2019] Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  35. Peherstorfer B, Marzouk Y (2019) A transport-based multifidelity preconditioner for markov chain monte carlo. Adv Comput Math 45:2321–2348 Peyré et al [2019] Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  36. Peyré G, Cuturi M, et al (2019) Computational optimal transport: With applications to data science. Found Trends Mach Learn 11(5-6):355–607 Prashanth and Ghavamzadeh [2016] Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  37. Prashanth L, Ghavamzadeh M (2016) Variance-constrained actor-critic algorithms for discounted and average reward mdps. Machine Learning 105(3):367–417 Prashanth et al [2022] Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  38. Prashanth L, Fu MC, et al (2022) Risk-sensitive reinforcement learning via policy gradient search. Found Trends Mach Learn 15(5):537–693 Rowland et al [2018] Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  39. Rowland M, Bellemare M, Dabney W, et al (2018) An analysis of categorical distributional reinforcement learning. In: AISTATS 2018, PMLR, pp 29–37 Sato et al [2001] Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  40. Sato M, Kimura H, Kobayashi S (2001) Td algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif 16(3):353–362 Schulman et al [2015] Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  41. Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:150602438 Singh et al [2020] Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  42. Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for Dynamics and Control, pp 958–968 Singh et al [2022] Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  43. Singh R, Lee K, Chen Y (2022) Sample-based distributional policy gradient. In: Learning for Dynamics and Control Conference, PMLR, pp 676–688 Sukhbaatar et al [2018] Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  44. Sukhbaatar S, Lin Z, Kostrikov I, et al (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: 6th International Conference on Learning Representations, ICLR 2018 Sutton and Barto [2018] Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  45. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press Villani et al [2009] Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  46. Villani C, et al (2009) Optimal transport: old and new, vol 338. Springer Yang et al [2019] Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  47. Yang D, Zhao L, Lin Z, et al (2019) Fully parameterized quantile function for distributional reinforcement learning. NeurIPS 2019 32 Yue et al [2020] Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  48. Yue Y, Wang Z, Zhou M (2020) Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems 33:7135–7147 Ziebart [2010] Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University
  49. Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.